10,000 Matching Annotations
  1. Jul 2025
    1. Reviewer #3 (Public review):

      Summary:

      The authors investigated a possible role of Endophilin A1 in the inhibitory postsynaptic density.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires more robust analysis to be convincing.

      Specific comments:

      The authors have made a substantial effort to improve their manuscript. A number of issues, related to numbers of observations mentioned by the reviewers, are clarified in the revised manuscript. The authors have also clarified some of the other questions from the reviewers. The long list of issues brought up by the reviewers and the many corrections needed still raise questions about data quality in this manuscript.<br /> In response to my comments (Point 2), the added experiment with PSD95.FingR and GPN.FingR in cultured neurons (Fig. S5A-D) is a good addition; the in vivo data using FingRs in Figure S3 look less convincing however. In response to my Point 5, the authors have added a cell-free binding assay (Figure 5I). This is a useful addition, but to convincingly make the point of interaction between Gephyrin and EndoA1, more rigorous biophysical quantitation of binding is needed. The legend in Figure 5I states that 4 independent experiments were performed, but the graph only shows 3 dots. This needs to be corrected.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABA<sub>A</sub>R α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA<sup>-/-</sup> mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1<sup>-/-</sup> brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABA<sub>A</sub>R clustering phenotype in EndoA1<sup>-/-</sup> neurons by surface GABA<sub>A</sub>R γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For all of the electrophysiology experiments, only the number of neurons recorded is stated, but not the number of independent animals that these neurons were obtained from. The number of independent animals used should be stated for each panel. At least 3 independent animals should be used in each group, otherwise, more data needs to be added.

      We apologize for missing the information in the original manuscript. For all electrophysiological experiments, data were obtained from more than 3 experimental animals. The figure legends were updated to include the number of independent animals used for each panel.

      (2) For the cell culture experiments analyzing dendritic puncta at GABAergic synapses, the number of data points analysed appears to be the number of dendritic segments quantified, regardless of whether they originate from the same neuron or not. This analysis method is not valid, since dendritic segments from the same neuron cannot be counted as statistically independent samples. The authors need to average the values for all dendritic segments from one neuron, such that one neuron equals one data point. This alteration should be made for Figures 2B, 2D, 4H, 4J, 5B, 5C, 5E, 5J, 5L, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D. In addition, the number of independent cultures from which the neurons were obtained should be stated for each panel. At least 3 independent cultures should be used in each group, otherwise, more data need to be added.

      Thanks for the criticism. We reanalyzed the data throughout the manuscript as suggested and updated the figure legends accordingly. Moreover, we increased the number of neurons from independent experiments to further confirm the results in our revised manuscript.

      In the revised manuscript, we averaged the values for all dendritic segments from a single neuron and updated the data in Figure 3B, 3D, 4H, 4J, 5B, 5C, 5E, 5K, 5M, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D.

      Neurons analyzed in each group were derived from at least 3 independent cultures. Due to very low efficiency of sparse transfection in primary cultured hippocampal neurons, multiple experimental repetitions were necessary to obtain the sufficient number of neurons for analysis. We described statistical analysis in “Material and Methods” section in the original manuscript as follows:

      “For all biochemical, cell biological and electrophysiological recordings, at least three independent experiments were performed (independent cultures, transfections or different mice).”

      (3) Individual data points should be shown on all graphs, particularly in Figures 2C, 2F, 2I, 3F, 3K, and 3L.

      Thank you for the suggestion. We replaced the original graphs with scatterplots and mean ± S.E.M. in new Figures.

      (4) For each experiment, the authors should state explicitly in the methods section whether that experiment was conducted blind to genotype.

      Thank you for the suggestion. We have modified the description of blind analysis for each experiment in methods section to “Seizure susceptibility was measured blindly by rating seizures on a scale of 0 to 7 as follows…”, “Quantification of immunostaining were carried out blindly…” in our revised manuscript.

      (5) For each experiment, the authors should state whether they used male or female mice, and what age the mice were at the time of the experiment

      Thanks a lot for the suggestion. We usually use male and female mice for neuron culture and behavioral test. We observed no sex-related differences in PTZ-induced behaviors, so the results were pooled together.

      For mice ages, P0 pups were used for hippocampal neuron cultures and virus injection in electrophysiological recording assays or FingR probes assays. P14-21 mice were used for electrophysiological recording, immunofluorescent staining and FingR probes detection in brain slice, while adult mice (P60) for behavioral tests, immunofluorescent staining in brain slice and biochemical assays. We have modified the description in genders and ages of mice in methods section to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates were intraperitoneally administered… ”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “Hippocampi of female or male pups (P0) were rapidly dissected under sterile conditions…”, “PSD fractions from adult mouse brain were prepared as previously described…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…” in our revised manuscript.

      (6) For each experiment involving WT and KO mice, please state whether WTs and KOs were bred as littermates from heterozygous breeders

      Sorry for the confusion. In our study, EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders. We added the information in methods section as follows in our revised manuscript, “EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders…”, “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates…”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “For co-IP from brain lysates, the whole brain from 8-10-week-old WT and KO littermates were dissected…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…”.

      (7) For experiments comparing three or more groups, the authors claim in the methods section to have used a one-way ANOVA for statistical analysis. However, no ANOVA values are given, only the post-hoc tests. Please add the ANOVA values for each experiment before stating the values of the post-hoc analysis.

      Sorry for the missing information. We used one-way ANOVA for comparing three or more groups in the original manuscript and have changed to two-way ANOVA for behavior data analysis in our revised manuscript as suggested in Recommendations (18). We added the ANOVA values (F & p values) for each experiment in new figures. For example, see Figure 1C.

      (8) In Figure 1A-C, seizure susceptibility was compared in EEN+/+ and EEN-/- mice, but the methods section states that seizure susceptibility was evaluated in 8-10-week-old male C57BL/6N mice (line 513). Was this meant to indicate that the EEN+/+ and EEN-/- mice were on a C57BL/6N background? How does this match with the statement that EEN1 -/- mice were generated on a C57BL/6J background (line 467)?

      We apologize for the mistake. In our study, EEN1<sup>-/-</sup> mice were generated on a C57BL/6J background, as stated in our previously published papers (Yang et al., 2021; Yang et al., 2018) and in “Animals” in Material and Methods of our original manuscript. We had corrected the statement to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates…” in Material and Methods of the revised manuscript.

      (9) In the electrophysiology experiments in Figure 1E-O, it is not clear to me which neurons were recorded in the control group. The methods section states that "Whole-cell recordings were performed on an AAV-infected neuron and a neighboring uninfected neuron" (line 736). However, the figure legends states that recordings were obtained from "10 control (Ctrl, mCherry alone) and 10 EEN1 KO (mCherry and Cre) pyramidal neurons" (line 1079), which would indicate that the controls are not uninfected neurons from the same animal, but AAV-mCherry infected neurons from a different animal. Please clarify which of the two descriptions is accurate.

      Thanks for catching the error! In all electrophysiological experiments, a neighboring uninfected neuron was used as the control in Figure 1E-O. This was incorrectly stated in the figure legend of the original manuscript. In the revised manuscript, the information has been corrected in figure legends of new Figure 1 (E-F).

      (10) The authors show that in Endophilin A1 KO animals, eIPSCs are reduced, but mIPSC frequency and amplitude are unaltered. How do they explain this finding in the context of the fact that gephyrin and GABAAR1.

      We apologize for the confusion about the data of electrophysiological recording. Compared with eIPSC, which are recorded in the presence of electrically evoked action potential that elicited a substantial release of neurotransmitter, mIPSCs are small, spontaneous currents recorded in the presence of TTX during patch-clamp experiments, resulting from the release of neurotransmitters from presynaptic terminals in the absence of action potential. The amplitude of mIPSCs typically reflects the quantal release of neurotransmitters, while their frequency can vary depending on synaptic activity and the state of the neuron.

      A number of molecules fine-tune presynaptic neurotransmitter release and functions of inhibitory postsynaptic receptors. In our study, inhibitory postsynapses were partially affected in endophilin A1 knockout neurons, while presynaptic endophilin A1 remained intact during electrophysiological recordings. Conceivably, the observed deficits in endophilin A1 knockout mice were mild. Following endophilin A1 depletion, inhibitory postsynaptic receptors appeared sufficient to respond to spontaneous neurotransmitter release but may be inadequate to large amounts of neurotransmitter release evoked by action potential. Meanwhile, spontaneous synaptic activity and the state of the neuron were not obviously affected under basic state by endophilin A1 depletion during postnatal stages. Consequently, mIPSC frequency and amplitude remain unaltered but eIPSCs were reduced compared to the control neurons. This finding was consistent with behavioral experiments, where aggressive epileptic behaviors were induced by PTZ rather than spontaneous epilepsy in endophilin A1 knockout mice.

      (11) Distribution of gephyrin, VGAT, and GABAARg2 differs substantially between the different layers of hippocampal area CA1, and the same goes for the other regions of the hippocampus. However, in Figure 2, it is not clear to me from the sample images which layers of each subregion the authors quantified, or indeed whether they paid attention to which layers they included in their analysis. This can lead to a substantial skewing of the data if different layers were preferentially included in the two genotypes. Please clarify which layers were analysed, and how comparability between WTs and KOs was ensured. This is particularly important given the authors' claim that Endophilin A1 acts equally at all subtypes of GABAergic synapses (lines 373- 376).

      Thanks for the cautiousness! We distinguished each hippocampal subregion based on the anatomical structure in brain slices. Quantification of fluorescent mean intensity of each synaptic protein in all layers of each subregion, as shown in new Figure 2 and Figure S2A-F, revealed that GABAergic synaptic proteins were impaired in both P21 and P60 KO mice.

      We further analyzed the fluorescent signal of core postsynaptic component, gephyrin, in individual layers of each subregion in the hippocampus of mature WT and KO mice, as presented in new Figures S2G-H. Our findings demonstrated a decrease in gephyrin levels across all layers of each subregion in KO mice. Additionally, we examined gephyrin clustering across the soma, axon initial segment (AIS), and dendrites in cultured mature endophilin A1 knockout hippocampal neurons, as shown in new Figure S5E-H. The results showed that gephyrin was affected in all subcellular regions following endophilin A1 knockout.

      Collectively, these data suggest that endophilin A1 functions across all subtypes of GABAergic postsynapses.

      (12) In Figure 3E-F, the authors state that there was no change in the total level of synaptic neurons in EEN1 KO neurons (line 188). However, there is no quantification of the total level of synaptic neurons shown, and based on the immunoblot in Figure 3E, it looks like there is a substantial reduction in NR1, NL2, and g2. The authors should present a quantification of the total levels of these proteins and adjust their statement accordingly if necessary.

      Thanks a lot for your comments. We quantified the total protein levels in Figure 3E and added the result to new Figure 3F, showing that total protein levels were not obviously affected in cultured KO neurons. When normalized to total protein levels, the surface levels of GABA<sub>A</sub> receptors were significantly compromised compared to surface GluN1 and NL2. Furthermore, the total protein levels were not affected in brains of KO mice, as shown in Figures 3K (input) and 3L (S1). Collectively, there was no change in the total level of synaptic proteins in KO neurons.

      (13) In Figure 3G-I, the authors claim, based on super-resolution images as presented here, that Endophilin A1 colocalizes with gephyrin and g2. However, no quantification of this colocalization is presented. The authors should add this quantification to support their claim and indicate how many GABAergic synapses contain Endophilin A1.

      Thank you for the thoughtful comments. The resolution of the images is significantly improved by super-resolution microscopy. As a result, the overlap between the two proteins will become smaller or even disappear. Since no two proteins can occupy the same physical space, they would show lower colocalization and instead exhibit proximal localization. As expected, in Figures 3G and 3H, we observed only small overlap or proximal localization of endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2. To further confirm the localization of endophilin A1 in inhibitory synapses, we co-stained endophilin A1 with both pre- and post-synaptic proteins, gephyrin and Bassoon. Then we quantified the colocalization of endophilin A1 with gephyrin or with Bassoon using the method for super-resolution images described in the reference (Andrew D. McCall. Colocalization by cross-correlation, a new method of colocalization suited for super-resolution microscopy. McCall BMC Bioinformatics (2024) 25:55). The percentage of gephyrin or Bassoon puncta that were in close proximity with endophilin A1 was also calculated, as shown in new video 5 and new Figure S4B-G. These data have been added in the revised manuscript as follows, “We further detected the localization of endophilin A1 to inhibitory synapses by co-immunostaining with both pre- and post-synaptic markers (Figure. S4B and Video 5). Quantitative analysis of super-resolution localization maps revealed that ~ 47 % puncta of gephyrin or Bassoon were proximal to endophilin A1 (Figure. S4G, n \= 14), with a mean distance between endophilin A1- and gephyrin-positive pixels of ∼ 120 nm, or between endophilin A1- and Bassoon-positive pixels of ∼ 130 nm (Figure. S4C-F).”

      (14) In the quantification shown in Figure 3K-L, there are no error bars in the WT data sets. This presumably means that all values were normalized to WT. However, since this artificially eliminates the variance in the WT group, a t-test is no longer valid, since this assumes a normal distribution and normal variance, which are no longer given. The authors should either change the way they normalize their data to maintain the variance in the WT group or perform a different statistical test that can account for the artificial lack of variance in one of the groups.

      Thank you for the suggestions! We modified our analysis approach. Specifically, we used mean value of WTs to normalize data to preserve the variance in the WT group and performed unpaired t-tests to assess statistical significance in Figure 3K-L. Additionally, we replaced the bar graphs with modified graphs showing individual data points. Please see Response to Recommendation (12).

      (15) What is the difference between the coIP experiment in Figure 4E and 3J, right panel? In both cases, an Endophilin A1 IP is performed, and gephyrin, GABAARg2, and GABAARa1 are assessed. However, Figure 3J's right panel indicates that Endophilin A1 does interact with the GABAAR subunits, whereas Figure 4E shows that it does not. How do the authors explain this discrepancy? Were these experiments performed more than once?

      Sorry for the confusion. Figure 3J and Figure 4E show data from immunoisolation assay and conventional co-immunoprecipitation (co-IP), respectively. Immunoisolation allows for the rapid and efficient separation of subcellular membrane compartments using antibodies conjugated to magnetic beads. In Figure 3J, we used antibodies against GABA<sub>A</sub>R α1 subunit or endophilin A1 to isolate the inhibitory postsynaptic membranes or endophilin A1-associated membranous compartments. In contrast, co-immunoprecipitation detects direct protein-protein interactions in detergent-solubilized lysates. For Figure 4E, we applied antibodies against endophilin A1 to precipitate its interaction partners. The results in Figure 3J and Figure 4E demonstrate that endophilin A1 is localized in the inhibitory postsynaptic compartment and directly interacts with gephyrin, but not with GABA<sub>A</sub>Rs. Detailed information regarding the methods used for co-IP and immunoisolation can be found in “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Material and Methods” section of original manuscript.

      These experiments were repeated multiple times to ensure reliability. In fact, consistent data showing endophilin A1 localization in the inhibitory postsynaptic compartment were observed in Figure 3K, showing the quantified data as well.

      (16) For the colocalization analysis in Figure 5A-C, what percentage of gephyrin puncta contain g2 in the WT and Endophilin A1 KO? Currently, only a correlation coefficient is provided, but not the degree of overlap. Please add this information to the figure.

      Thanks for the comments on the colocalization analysis. We analyzed the percentage of gephyrin puncta overlapping with GABA<sub>A</sub>R γ2 and added the graphs in new Figure 5C.

      (17) Figure 6 investigates how actin depolarization affects GABAergic synapse function, but does not assess how Endophilin A1 contributes to this process. The authors then provide an extremely short statement in the discussion, stating that their data are contradictory to a previous study (lines 412 - 417). This section of the discussion should be expanded to address the specific role of Endophilin A1 in the consequences of actin depolymerization.

      Thanks a lot for the advice. In the original manuscript, we discussed the specific role of endophilin A1 at inhibitory postsynapses as follows in Discussion:

      “As membrane-binding and actin polymerization-promoting activities of endophilin A1 are both required for its function in enhancing iPSD formation and g2–containing GABA<sub>A</sub>R clustering to iPSD, we propose that membrane-bound endophilin A1 promotes postsynaptic assembly by coordinating the plasma membrane tethering of the postsynaptic protein complex and its stabilization with the actin cytomatrix”

      Following your advice, we added a statement in the revised manuscript addressing the role of endophilin A1 in actin polymerization at inhibitory postsynapses, shown as follows, “In the present study, the impaired clustering of gephyrin and GABA<sub>A</sub> γ2 by F-actin depolymerization underscores the essential role of F-actin in the assembly and stabilization of the inhibitory postsynaptic machinery. Membrane-bound endophilin A1 promotes F-actin polymerization beneath the plasma membrane through its interaction with p140Cap, an F-actin regulatory protein, thereby facilitating and/or stabilizing the clustering of gephyrin and γ2-containing GABA<sub>A</sub> ​receptors at postsynapses.”

      (18) Which statistical analysis was conducted in Figure 7F? Given the nature of the data, a repeated measures ANOVA would be necessary to accurately assess the statistical accuracy.

      Sorry for the confusion. We conducted one-way ANOVA followed by Tukey post hoc test at each time point in original Figure 7F. We have employed the method of repeated measures ANOVA followed by Tukey post hoc test as suggested in new Figure 7F. Meanwhile, we reanalyzed data in new Figure 1C with the same method. We also modified the description in “Statistical analysis” and Figure legends for new Figure1C and 7F in revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Data presentation:

      (1) Figures 2A, B, D, E, G, H. Figures S2A, B, D:

      Add P21 or P60 labels to these figures so that the difference between similarly stained samples (e.g. Figures 2A, B) is obvious to the reader.

      Thanks! We added “P21” or “P60” labels in new Figure 2 and Figure S2 as suggested.

      (2) Figures 4C, D:

      The authors must make their coIP data annotation consistent. In Figure 4C, they use actual microgram amounts when, e.g., describing how much input was present, yet in Figure 4D they use + and -. The authors should pick one.

      Thanks for the comments. We labeled the consistent data annotation in new Figure 4C and 4D, we also changed the label in 4F for the consistent data annotation.

      (3) Figure 5A

      GFP is gray in this figure, but in all other figures, it is blue. Consider changing for presentation reasons.

      Thanks a lot for pointing out the problem. We replaced gray with blue color to indicate GFP in new Figure 5A.

      (4) Figures 6A, C, E, G

      Label graphs as either short-term or long-term drug treatment.

      Thanks for the suggestion. We labeled the graphs as 60 min for short-term or 120 min for long-term drug treatment in new Figure 6A, C, E, G for convenient reading.

      Annotation, grammar, spelling, typing errors:

      (1) Figure 4G:

      Merge and GFP labels are seemingly swapped.

      Thanks a lot for sharp eye. We corrected the labels in new Figure 4G.

      (2) Fig 4I:

      The authors use "Gephryin" instead of GPN. They should be consistent and choose one.

      Sorry for the mistake. We changed the label consistent with other figures in new Figure 4I and rearranged the images in figures for good looking.

      (3) "One-hour or two-hour treatment of mature neurons with nocodazole..."

      Thanks for your advice. We modified the sentence to “Treatment of mature neurons with nocodazole, a microtubule depolymerizing reagent, for one hour (short-term) or two hours (long-term), caused…”.

      (4) The authors should indicate that one-hour is their short-term treatment and that two-hour is their long-term treatment so that when these terms are used later to describe LatA experiments, it is clearer to the reader.

      Thanks for your comments. We modified the statement as seen in Response to Recommendation (3), it is clearer to the reader.

      (5) EEA1. The authors should use a more conventional term EndoA1 so that the manuscript can be searched easily.

      Thanks a lot for the suggestion. We replaced all of the term “EEN1” with “EndoA1” in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Major Points

      (1) The number of observations for the electrophysiology experiments in Figure 1 (dots are neurons) is very low and it is not clear whether the data shown is derived from different mice. The same criticism applies to the data shown in Figures 7G-K.

      We apologize for the low neuron number in electrophysiology experiments. In the patch-clamp experiments, the number of neurons recorded was higher than what is shown in the figures. However, neurons with a membrane resistance (Rm) below 500 MΩ, indicating unstable seals or poor conditions, were excluded from the analysis. Additionally, we added the number of mice from which the data derived in each group in the figure legends for Figure 1, 7 and S1, this point was also raised by Reviewer #1 (Please see Response to Recommendation (1)).

      (2) Images in Figure 2 are shown at low magnification, statements on changes in intensity of inhibitory synaptic markers in the hippocampal region are impossible to interpret. Analysis of inhibitory synapses in vivo would require sparse neuronal labeling and 3D reconstruction, for instance using gephyrin-FingRs (Gross et al., Neuron 2013).

      Thanks for your insightful suggestion. We obtained pCAG_PSD95.FingR-eGFP-CCR5TC and pCAG_GPN.FingR-eGFP-CCR5TC constructs from Addgene (plasmid # 46295 & #46296). We attempted in utero electroporation (IUE) to introduce the DNAs into cortical neurons or hippocampal neurons at E14.5, unfortunately with no success. Following the repetitive operation for numerous times, we could eventually obtain newborn pups of ICR mice after IUE. However, we failed to obtain any newborn pups of C57BL/6J mice due to abortion following the procedure. Furthermore, pregnant C57BL/6J mice (WTs or KOs) did not survive or remained in a poor state of health after surgery. Therefore, we were unable to analyze synapses through sparse labeling and 3D reconstruction by IUE. Alternatively, we obtained commercial AAVs carrying rAAV-EF1a-PSD95.FingR-eGFP-CCR5TC and rAAV-EF1a-mRuby2-Gephyrin.FingR-IL2RGTC, then injected into the CA1 region of EndoA1<sup>fl/fl</sup> mice at P0. Mice were fixed and detected the fluorescent signals in CA1 regions at P21. Consistent with immunostaining with antibodies, decreased mRuby2-Gephyrin.FingR or PSD95.FingR-eGFP was observed in dendrites of KO neurons at P21, as shown in new Figure S3. In combination with electrophysiological recording, PSD fractionation and immunoisolation from brains, these data support our conclusion regarding the effects of endophilin A1 knockout on the inhibitory synapses.

      Additionally, we transfected DIV12 cultured hippocampal neurons with pCAG_PSD95.FingR-eGFP-CCR5TC or pCAG_GPN.FingR-eGFP-CCR5TC and observed fluorescent signals on DIV16. Both the signal intensity and number of GPN.FingR-eGFP clusters were also significantly attenuated, with no obvious changes in PSD95.FingR-eGFP clusters in dendrites of mature neurons, as shown in new Figure S5A-D. We are very pleased that the result further strengthened our original conclusion. We have added the new pieces of data in our revised manuscript.

      (3) Figure 3: surface labeling of GluA1 or the GABAAR gamma 2 subunit is difficult to interpret: the patterns are noisy and the numerous puncta appear largely non-synaptic although this is difficult to judge in the absence of additional synaptic markers. It appears statistics are done on dendritic segments rather than the number of neurons. The legend does not mention how many independent cultures this data is derived from. In their previous study (Yang et al., Front Mol Neurosci 2018), the authors noted a decrease in surface GluA1 levels in the absence of endophilin A1. How do they explain the absence of an effect on surface GluA1 levels in the current study?

      Sorry for the concern and thanks for your comments. First, we assessed changes in the surface levels of excitatory and inhibitory receptors by co-immunostaining in cultured WT and KO hippocampal neurons. Given the very low transfection efficiency of neurons in high density culture, numerous puncta of receptors from adjacent non-transfected neurons were also detected. This approach may contribute to the noisy pattern observed in Figure 3A. Besides, the projections of z-stack for higher magnified dendrites may likely introduced higher background signals. We have now replaced the original images with the newest repeat in new Figure 3A. Moreover, we confirmed a decrease in the surface expression of GABA<sub>A</sub>R γ2 by the biotinylation assay, as shown in Figure 3E. Indeed, we agree that some puncta for surface labeling of receptors seemed to be non-synaptic localization. In order to reflect the decrease in synaptic proteins at synapses, we isolated PSD fraction by biochemical assay and found that gephyrin and GABA<sub>A</sub>R γ2, two major inhibitory postsynaptic components, were reduced in the PSD fraction from KO brains, as shown in Figure 3L. Their colocalization was also attenuated in the absence of endophilin A1, as shown in Figure 5A-C. Combined with electrophysiological recording, these data from multiple assays indicate GluA1 at synapses was not obviously affected but GABA<sub>A</sub>R γ2 at synapses was impaired in endophilin A1 KO neurons in the present study.

      We have corrected the way that the number of samples is defined for statistical analysis as suggested. This point was also raised by Reviewer #1 (Recommendation (2)). We averaged the values from all dendritic segments of a single neuron, such that one neuron equaled one data point. We had replaced the original Figure 3B and 3D (please see Response to Recommendation (2) by Reviewer #1). Additionally, we added the number of independent cultures these data were derived from to figure legends in revised manuscript.

      Previously, we observed a small decrease in surface GluA1 levels in spines under basal conditions and a more pronounced suppression of surface GluA1 accumulation in spines upon chemical LTP in endophilin A1 KO neurons from EndoA1<sup>-/-</sup> mice that knockout endophilin A1 since embryonic development stages (Figure 5C,H. Yang et al., Front Mol Neurosci, 2018). In Figure 3A and B in current study, we analyzed surface receptor levels in GFP-positive dendrites, rather than spines, under basal conditions when endophilin A1 was depleted at the later developmental stage. We found a decrease in surface GABA<sub>A</sub>R γ2 levels but no significant effects on surface GluA1 levels in dendrites. These findings indicate that endophilin A1 primarily affects excitatory synaptic proteins in spines during synaptic plasticity and inhibitory synaptic proteins in dendrites under basal conditions in mature neurons.

      (4) Super-resolution images in Figure 3G, H, I: endophilin A1 puncta look different in panel 3I compared to 3G and 3H, which are very noisy. It is difficult to interpret how specific these EEN1 puncta are. Previous images showing EEN1 distribution in dendrites look different (Yang et al., Front Mol Neurosci 2018); is the same KO-verified antibody being used here? Colocalization of EEN1 with gephyrin or the GABAAR gamma 2 subunit is difficult to interpret; gephyrin mostly does not seem to colocalize with EEN1 in the example shown.

      Sorry for your concerns. As stated previously in Major Points (3), transfection efficiency was very low in cultured neurons and our cultured neurons were at relative high density. As a result, numerous puncta of proteins located in the adjacent non-transfected neurons were also detected, which may contribute to noisy signals observed in Figure 3G-I.

      In our previous paper, we confirmed the specificity of the antibody against endophilin A1 (5A,B. Yang et al., Front Mol Neurosci, 2018). We used the same antibody (rabbit anti-endophilin A1, Synaptic Systems GmbH, Germany) in the current study. While the previous images were obtained using confocal microscopy, the current images in Figures 3G, H, and I were acquired using super-resolution microscopy (SIM). The different patterns observed in the dendrites may be attributed to the difference in image resolution, antibodies dilution and reaction time.

      Reviewer #1 also points out the quantification of colocalization of gephyrin and GABA<sub>A</sub>R γ2 with endophilin A1. Please see Response to Recommendation (13) by Reviewer #1.

      (5) The interaction of gephyrin and endophilin A1 is based on coIP experiments in cells and brain tissue. To convincingly demonstrate that these proteins interact, biophysical experiments with purified proteins are necessary.

      Thanks a lot for your great suggestions on the interaction of endophilin A1 with gephyrin. To convincingly demonstrate their interaction, we performed pull-down assay with purified recombinant proteins and the result shows that both G and E domains of gephyrin were involved in the interaction with endophilin A1. The data has been added to the revised manuscript as new Figure 5I. We also modified the statement about the data and figure legends in the revised manuscript.

      (6) Figure 4G: the gephyrin images are not convincing; the inhibitory postsynaptic element typically looks somewhat elongated; these puncta are very noisy and do not appear to represent iPSDs. The same criticism applies to the images shown in Figures 5 and 7.

      Thanks for the comment. The gephyrin puncta in our images exhibited heterogeneous shapes and sizes, with some appearing somewhat elongated. To address this, we compared the puncta pattern of gephyrin with that shown in the reference. As illustrated in the figure from the reference, gephyrin puncta also displayed distinct shapes and sizes, Figure 3A-F, Neuron 78, 971–985, June 19, 2013). Please note that the images were z-stack projections at higher magnification, as described in the "Materials and Methods" section. This approach may likely introduce higher background signals and may contribute to the much more heterogeneous appearance of the puncta in Figures 4, 5, and 7. As mentioned previously, the numerous gephyrin puncta located in the adjacent non-transfected neurons may also contribute to some of the noisy signals observed. We have replaced the original images with new images in new Figure 4G, 5 and 7.

      Moreover, in order to confirm the effects of endophilin A1 KO on the gephyrin clustering, we also detected the endogenous clusters of gephyrin or PSD95 visualized by GPN.FingR-eGFP or PSD95.FingR-eGFP in cultured mature neurons. The results were consistent with immunostaining with antibodies against gephyrin. Please see Response to Recommendation (2)

      (7) Figure 7E, F: the rescue (Cre + WT) appears to perform better than the control (mCherry + GFP) in the PTZ condition; how do the authors explain this? Mixes of viral vectors were injected, would this approach achieve full rescue?

      Thanks for the thoughtful comment. Mixed viruses were injected bilaterally into the hippocampal CA1 regions. The results showed a full rescue effect by WT endophilin A1 in knockout mice during the early days, with even a little bit better rescue effect than the control group in the later days under the PTZ condition, as shown in Figures 7E and 7F. In the current study, overexpression of endophilin A1 increased the clustering of gephyrin and GABA<sub>A</sub>R γ2 in cultured neurons, as shown in Figures 4I-J and 5D-E. Presumably, the slightly better rescue effects observed in the behavioral tests was likely attributed to the enhanced clustering and/or stabilization of gephyrin/GABA<sub>A</sub>R γ2 by WT endophilin A1 expression in KO neurons in vivo. Moreover, the electrophysiological recording also showed full rescue effects on eIPSC by WT endophilin A1 in KO neurons (Figure 7G-K).

      Minor Points

      (1) The authors mention that they previously found a decrease in eEPSC amplitude in EEN1 KO mice (Yang et al., Front Mol Neurosci 2018). The data in Fig. 1E suggests a decrease in eEPSC amplitude but is not significant here, likely due to the small number of observations. If both eEPSC and iEPSC amplitude are reduced in the absence of EEN1. Would the E/I ratio still be significantly changed?

      We apologize for the confusion. In our previous study, AMPAR-mediated excitatory postsynaptic currents (eEPSCs) were found to be slightly but significantly reduced compared to the control group, while NMDAR-mediated excitatory postsynaptic currents showed no significant difference (Figure 4N,O. Yang et al., Front Mol Neurosci, 2018). In the current study, we adopted a different recording protocol, simultaneously measuring eEPSCs and eIPSCs from the same neuron to calculate the E/I ratio. Unlike previous studies, we did not use inhibitors to suppress GABA receptor activity. As a result, the recorded signals did not distinguish AMPAR-mediated or NMDAR-mediated excitatory postsynaptic currents to reflect total eEPSCs, which may explain the non-significant reduction observed compared to control neurons in this study.

      It is possible that the eEPSC amplitude would show a significant reduction if a larger number of neurons were recorded. Nevertheless, the larger suppression of eIPSCs in the absence of endophilin A1 indicates that the E/I ratio is significantly altered.

      (2) Page 7: the authors mention they aim to exclude effects on presynaptic terminals of deleting endophilin A1 in cultured neurons, is this because of a sparse transfection approach?

      Please clarify.

      Sorry for the confusion. In cultured neurons, we always observed sparse transfection due to the very low transfection efficiency (~ 0.5%). Therefore, we could examine the effects of endophilin A1 knockout specifically in the specific CamKIIa promoter-driven Cre-expressing postsynaptic neurons, while endophilin A1 remained intact in the non-transfected presynaptic neurons.

      (3) The representative blot of the surface biotinylation experiment (Figure 3E) suggests that loss of endophilin A1 also affects GluN1 and Nlgn2 levels, and error bars in panel 3F (lacking individual data points) suggest these experiments were highly variable.

      Sorry for the confusion. Reviewer #1 also raised the question and we quantified the total level of GluN1 and NL2 in Figure 3E. And we replaced the original graphs with scatterplots and means ± S.E.M. Please see the Response to Recommendation (3) & (12) by Reviewer #1.

      (4) Have other studies analyzing inhibitory synapse composition identified endophilin A1 as a component? The rationale for this study seems to be primarily based on the presence of epileptic seizures and E/I imbalance.

      Thank you for your questions. To date, no other studies investigated endophilin A1 as an inhibitory postsynaptic component. We observed the proximal localization of endophilin A1 with inhibitory postsynaptic proteins using super-resolution microscopy (SIM) and quantification results showed ~ 47% puncta of gephyrin correlated with endophilin A1 (Figure 3G-I and S4B-G). We further immunoisolated the inhibitory postsynaptic fraction using GABA<sub>A</sub> receptors and found that endophilin A1 was present in the isolated fraction, and vice versa (Figure 3J). Additionally, we demonstrated that endophilin A1 directly interacted with gephyrin through co-IP and pull-down assays (Figure 5J-I). Together with data from immunolabeling, biochemical assays, electrophysiological recordings, and behavioral tests, these results identified endophilin A1 as an inhibitory postsynaptic component.

      (5) Figure 3J: what are S100 and P100 labels? Is Nlgn2 part of the EEN1 complex? If it is, why are Nlgn2 surface levels not affected by EEN1 loss (Figure 3E, F, K)? Why does EEN1 not interact with Nlgn2 in HEK cells (Figure 4D)?

      Sorry for the confusion. The detailed information regarding S100 and P100 can be found in the “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Materials and Methods” section. S100 contains soluble proteins, while P100 refers to the membrane fraction after high speed (100,000xg) centrifugation.

      Figures 3J-K and 4C-F showed the data from immunoisolation and conventional co-immunoprecipitation assays, respectively. Immunoisolation, which uses antibodies coupled to magnetic beads, allows for the rapid and efficient separation of subcellular membrane compartments. In Figure 3J-K, we used antibodies against GABA<sub>A</sub>R α1 to isolate membrane protein complexes from the inhibitory postsynaptic fraction. In contrast, co-immunoprecipitation typically detects direct interactions between proteins solubilized by detergent treatment. For Figure 4C-F, FLAG beads were used in HEK293 lysates, or antibodies against endophilin A1 were employed in brain lysates to precipitate direct interaction partners. Combined with the results from Figure 3J-L, the data in 4C-F indicated that endophilin A1 was localized in the inhibitory postsynaptic compartment and directly bound to gephyrin but not to either GABA<sub>A</sub> receptors or Nlgn2 (NL2). This binding promoted the clustering of gephyrin and GABA<sub>A</sub>R γ2 at synapses, facilitating GABA<sub>A</sub>R assembly.

      Nlgn2 (NL2) is a key inhibitory postsynaptic component but does not directly bind to endophilin A1. Consequently, endophilin A1 failed to co-immunoprecipitate with NL2 in the presence of detergent in HEK293 cell lysates (Figure 4D). Furthermore, the surface levels of NL2 or its distribution in PSD fraction were unaffected by the loss of endophilin A1 (Figure 3E, F, K, L). This suggests that mechanisms independent of endophilin A1 orchestrate the surface expression and synaptic distribution of NL2.

      (6) How do the authors interpret the finding that endophilin A1, but not A2 or A3, binds gephyrin? What could explain these differences?

      Thanks for the thoughtful comment. Endophilin As contain BAR and SH3 domains. While the amino acid sequences in the BAR and SH3 domains are highly conserved, the intrinsically disordered loop region between BAR and SH3 domains is highly variable. A study by the Verstreken lab revealed that a human mutation in the unstructured loop region of endophilin A1 increases the risk of Parkinson's disease. They also demonstrated that the disordered loop region controls protein flexibility, which fine-tunes protein-protein and protein-membrane interactions critical for endophilin A1 function (Bademosi et al., Neuron 111, 1402–1422, May 3, 2023). Our previous study showed that endophilin A1 and A3, but not A2, bind to p140Cap through their SH3 domains, despite the high sequence homology in the SH3 domains among these proteins (Figure2A,B. Yang et al., Cell Research, 2015). These findings indicate that each endophilin A likely interacts with specific partners due to distinct key amino acids.

      Additionally, endophilin A1 is expressed at much higher levels than A2 and A3 in neurons, with distinct distribution of them across different brain regions. Our lab demonstrated that the function of A1 at postsynapses (both excitatory and inhibitory synapses) cannot be compensated by A2 or A3. Therefore, it is reasonable that endophilin A1, rather than A2 or A3, binds to gephyrin, even though the underlying mechanisms remain unclear.

      (7) Figure 4G: panels are mislabeled (GFP vs merge).

      Thanks for careful reading and sorry for the mistake. We corrected the label in new Figure 4G. Please see Response to Annotation, grammar, spelling, typing errors:(1) by Reviewer #2.

    1. eLife Assessment

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Understanding how genetic compensation pathways are involved in gene function is an important question. However, there is incomplete evidence provided in the manuscript at this point to conclude that discrepancies between observed phenotypes are due to genetic compensation.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

    3. Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

    4. Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      The reviewer points out some excellent caveats regarding the morphant experiments. We agree that at least some of the effects of the podxl morpholino may be related to its effects on kidney development and/or gross developmental defects that impede liver development. Because of these limitations, we focused our experiments on analysis of CRISPant and mutant phenotypes, including showing that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effects on HSC number when injected with sgRNA#1. We did not observe any gross morphologic defects in podxl CRISPants. Liver size was not significantly altered in podxl CRISPants (Figure 2A). We will add brightfield images of podxl CRISPant larvae to the supplemental data for the revised manuscript.

      We agree with the reviewer that HRMA is not quantitative with respect to the fraction of alleles with indels and that capillary electrophoresis likely underestimates mutagenesis efficiency. Nonetheless, even with 100% mutation efficiency, podxl CRISPant knockdown, like most CRISPR knockdowns, would not represent complete loss of function:  ~1/3 of alleles will contain in-frame mutations and likely retain at least some gene function, so ~1/3*1/3 = 1/9 of cells will have no out-of-frame indels and contain two copies of at least partially functional podxl and ~2/3*2/3 = 4/9 of cells will have one out-of-frame indel and one copy of at least partially functional podxl. Thus, the decreased HSCs we observe with podxl CRISPant likely represents a partial loss-of-function phenotype in any case.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

      The reviewer proposes elegant experiments to address the specificity of the morpholino. For the revision, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s first concern. Please see our response above. In general, we agree that correlating phenotype penetrance with level of loss-of-function is a very good way to support conclusions regarding specificity in knockdown experiments. Unfortunately, because the phenotype we are examining (HSC number) has a relatively large standard deviation even in control/wildtype larvae (for example, 63 ± 19 (mean ± standard deviation) HSCs per liver in uninjected control siblings in Figure 1) it would be technically very difficult to do this experiment for podxl.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s second concern. Please see our response above. We acknowledge that some of the effects of the podxl morpholino may be non-specific. To address this concern in the revised manuscript, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      We appreciate the reviewer’s acknowledgement of the controls we performed to demonstrate the specificity of the CRISPant phenotypes. The proposed experiments (rescue, assessment of Podxl levels) would help bolster our conclusions but are technically difficult due to the relatively large standard deviation for the HSC number phenotype even in wildtype larvae and the lack of well-characterized zebrafish antibodies against Podxl.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      Thank you for this suggestion. We were referring in these sections to genes that are near the podxl locus with respect to three-dimensional chromatin structure; such genes would not necessarily be near the podxl locus on chromosome 4. We will clarify the text in this paragraph for the revised manuscript. At the same time, we will examine our transcriptomic data to check expression of mkln1, cyb5r3, and other nearby genes on chromosome 4 as suggested and include this analysis in the revised manuscript.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

      Thank you for this suggestion. We will modify these figures to clarify our results.

      Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Thank you very much for appreciating the hard work that went into this manuscript.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

      This reviewer makes an excellent point. Our finding that the largest changes in gene expression were in extracellular matrix (ECM) genes and ECM modulation is a major function of HSCs supports the hypothesis that genetic compensation is occurring in adults. Nonetheless, we agree that compensatory changes in adults may not fully reflect the compensatory changes during development, so it would bolster the conclusions of the paper to perform the RNA sequencing and qPCR experiments on zebrafish larval livers.

      We tried very hard to do this experiment proposed by Reviewer #3. In our hands, obtaining sufficient high-quality RNA for robust gene expression analysis typically requires pooling of ~10-15 larval livers. These larvae need to be obtained from a heterozygous in-cross in order to have matched wildtype sibling controls. Livers must be dissected from freshly euthanized (not fixed) zebrafish. Thus, this experiment requires genotyping live, individual larvae from a small amount of tissue (without sacrificing the larvae) before dissecting and pooling the livers. Unfortunately we were unable to confidently and reproducibly genotype individual live podxl larvae with these small amounts of tissue despite trying multiple approaches. Therefore we were not able to perform gene expression analysis on podxl mutant larval livers.

    1. eLife Assessment

      In this important study, the authors set out to determine the molecular interactions between the AQP2 from Trypanosoma brucei (TbAQP2) and the trypanocidal drugs pentamidine and melarsoprol in order to clarify the origins of clinically observed drug resistance and facilitate future drug design. Using cryo-EM, molecular dynamics simulations, and lysis assays, the authors present a solid theory for how drug resistance mutations in TbAQP2 prevent drug uptake. Overall, even though a few methodological issues still need minor clarification, this study will be of interest to those working on aquaporins and the development of drugs targeting aquaporins.

    2. Reviewer #1 (Public review):

      This study presents cryoEM-derived structures of the Trypanosome aquaporin AQP2, in complex with its natural ligand, glycerol, as well as two trypanocidal drugs, pentamidine and melarsoprol, which use AQP2 as an uptake route. The structures are high quality, and the density for the drug molecules is convincing, showing a binding site in the centre of the AQP2 pore.

      The authors then continue to study this system using molecular dynamics simulations. Their simulations indicate that the drugs can pass through the pore and identify a weak binding site in the centre of the pore, which corresponds with that identified through cryoEM analysis. They also simulate the effect of drug resistance mutations, which suggests that the mutations reduce the affinity for drugs and therefore might reduce the likelihood that the drugs enter into the centre of the pore, reducing the likelihood that they progress through into the cell.

      While the cryoEM and MD studies are well conducted, it is a shame that the drug transport hypothesis was not tested experimentally. For example, did they do cryoEM with AQP2 with drug resistance mutations and see if they could see the drugs in these maps? They might not bind, but another possibility is that the binding site shifts, as seen in Chen et al. Do they have an assay for measuring drug binding? I think that some experimental validation of the drug binding hypothesis would strengthen this paper. Without this, I would recommend the authors to soften the statement of their hypothesis (i.e, lines 65-68) as this has not been experimentally validated.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present 3.2-3.7 Å cryo-EM structures of Trypanosoma brucei aquaglyceroporin-2 (TbAQP2) bound to glycerol, pentamidine, or melarsoprol and combine them with extensive all-atom MD simulations to explain drug recognition and resistance mutations. The work provides a persuasive structural rationale for (i) why positively selected pore substitutions enable diamidine uptake, and (ii) how clinical resistance mutations weaken the high-affinity energy minimum that drives permeation. These insights are valuable for chemotherapeutic re-engineering of diamidines and aquaglyceroporin-mediated drug delivery.

      My comments are on the MD part.

      Strengths:

      The study

      (1) Integrates complementary cryo-EM, equilibrium, applied voltage MD simulations, and umbrella-sampling PMFs, yielding a coherent molecular-level picture of drug permeation.

      (2) Offers direct structural rationalisation of long-standing resistance mutations in trypanosomes, addressing an important medical problem.

      Weaknesses:

      Unphysiological membrane potential. A field of 0.1 V nm⁻¹ (~1 V across the bilayer) was applied to accelerate translocation. From the traces (Figure 1c), it can be seen that the translocation occurred really quickly through the channel, suggesting that the field might have introduced some large changes in the protein. The authors state that they checked visually for this, but some additional analysis, especially of the residues next to the drug, would be welcome.

      Based on applied voltage simulations, the authors argue that the membrane potential would help get the drug into the cell, and that a high value of the potential was applied merely to speed up the simulation. At the same time, the barrier for translocation from PMF calculations is ~40 kJ/mol for WT. Is the physiological membrane voltage enough to overcome this barrier in a realistic time? In this context, I do not see how much value the applied voltage simulations have, as one can estimate the work needed to translocate the substrate on PMF profiles alone. The authors might want to tone down their conclusions about the role of membrane voltage in the drug translocation.

      Pentamidine charge state and protonation. The ligand was modeled as +2, yet pKa values might change with the micro-environment. Some justification of this choice would be welcome.

      I don't follow the RMSD calculations. The authors state that this RMSD is small for the substrate and show plots in Figure S7a, with the bottom plot being presumably done for the substrate (the legends are misleading, though), levelling off at ~0.15 nm RMSD. However, in Figure S7a, we see one trace (light blue) deviating from the initial position by more than 0.2 nm - that would surely result in an RMSD larger than 0.15, but this is somewhat not reflected in the RMSD plots.

    4. Reviewer #3 (Public review):

      Summary:

      Recent studies have established that trypanocidal drugs, including pentamidine and melarsoprol, enter the trypanosomes via the glyceroaquaporin AQP2 (TbAQP2). Interestingly, drug resistance in trypanosomes is, at least in part, caused by recombination with the neighbouring gene, AQP3, which is unable to permeate pentamidine or melarsoprol. The effect of the drugs on cells expressing chimeric proteins is significantly reduced. In addition, controversy exists regarding whether TbAQP2 permeates drugs like an ion channel, or whether it serves as a receptor that triggers downstream processes upon drug binding. In this study the authors set out to achieve three objectives:<br /> (1) to determine if TbAQP2 acts as a channel or a receptor,<br /> (2) to understand the molecular interactions between TbAQP2 and glycerol, pentamidine, and melarsoprol, and<br /> (3) to determine the mechanism by which mutations that arise from recombination with TbAQP3 result in reduced drug permeation.

      Indeed, all three objectives are achieved in this paper. Using MD simulations and cryo-EM, the authors determine that TbAQP2 likely permeates drugs like an ion channel. The cryo-EM structures provide details of glycerol and drug binding, and show that glycerol and the drugs occupy the same space within the pore. Finally, MD simulations and lysis assays are employed to determine how mutations in TbAQP2 result in reduced permeation of drugs by making entry and exit of the drug relatively more energy-expensive. Overall, the strength of evidence used to support the author's claims is solid.

      Strengths:

      The cryo-EM portion of the study is strong, and while the overall resolution of the structures is in the 3.5Å range, the local resolution within the core of the protein and the drug binding sites is considerably higher (~2.5Å).

      I also appreciated the MD simulations on the TbAQP2 mutants and the mechanistic insights that resulted from this data.

      Weaknesses:

      (1) The authors do not provide any empirical validation of the drug binding sites in TbAQP2. While the discussion mentions that the binding site should not be thought of as a classical fixed site, the MD simulations show that there's an energetically preferred slot (i.e., high occupancy interactions) within the pore for the drugs. For example, mutagenesis and a lysis assay could provide us with some idea of the contribution/importance of the various residues identified in the structures to drug permeation. This data would also likely be very valuable in learning about selectivity for drugs in different AQP proteins.

      (2) Given the importance of AQP3 in the shaping of AQP2-mediated drug resistance, I think a figure showing a comparison between the two protein structures/AlphaFold structures would be beneficial and appropriate.

      (3) A few additional figures showing cryo-EM density, from both full maps and half maps, would help validate the data.

      (4) Finally, this paper might benefit from including more comparisons with and analysis of data published in Chen et al (doi.org/10.1038/s41467-024-48445-4), which focus on similar objectives. Looking at all the data in aggregate might reveal insights that are not obvious from either paper on their own. For example, melarsoprol binds differently in structures reported in the two respective papers, and this may tell us something about the energy of drug-protein interactions within the pore.

    1. eLife assessment

      This valuable manuscript presents findings supported by solid data to identify a surprising glia-exclusive function for betapix in vascular integrity and angiogenesis. The manuscript also describes the optimisation of a modified CRISPR-based Zwitch approach to generate conditional knockouts in zebrafish.

    2. Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural, or endothelial). Using RNA-in situ hybridization and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.<br /> While the text is well written, it often elides details of experiments and relies on implicit understanding on the part of the reader. Similarly, the figure legends are laconic and often fail to provide all the relevant details.

      Specific comments:

      (1) While the evidence from cKO's implicating glial betapix in vascular stability/angiogenesis is exciting, glia-specific rescue of betapix in the global KOs/mutants (like those performed for stathmin) would be necessary to make a water-tight case for glial betapix.

      (2) Splice variants of betapix have been shown to have differential roles in haemorrhaging (Liu, 2007). What are the major glial isoforms, and are there specific splice variants in the glial that contribute to the phenotypes described?

      (3) Liu et al, 2012 demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis. Are there proliferation/survival defects in endothelial cells in the glial KOs?

    3. Reviewer #2 (Public review):

      Summary:

      Using a genetic model of beta-pix conditional trap, the authors are able to regulate the spatio-temporal depletion of beta-pix, a gene with an established role in maintaining vascular integrity (shown elsewhere). This study provides strong in vivo evidence that glial beta-pix is essential to the development of the blood-brain barrier and maintaining vascular integrity. Using genetic and biochemical approaches, the authors show that PAK1 and Stathmins are in the same signaling axis as beta-pix, and act downstream to it, potentially regulating cytoskeletal remodeling and controlling glial migration. How exactly the glial-specific (beta-pix driven-) signaling influences angiogenesis or vascular integrity is not clear.

      Strengths:

      (1) Developing a conditional gene-trap genetic model which allows for tracking knockin reporter driven by endogenous promoter, plus allowing for knocking down genes. This genetic model enabled the authors to address the relevant scientific questions they were interested in, i.e., a) track expression of beta-pix gene, b) deletion of beta-pix gene in a cell-specific manner.

      (2) The study reveals the glial-specific role of beta-pix, which was unknown earlier. This opens up avenues for further research. (For instance, how do such (multiple) cell-specific signaling converge onto endothelial cells which build the central artery and maintain the blood-brain barriers?)

      Weaknesses:

      Major:

      (1) The study clearly establishes a role of beta-pix in glial cells, which regulates the length of the central artery and keeps the hemorrhages under control. Nevertheless, it is not clear how this is accomplished.<br /> a. Is this phenotype (hemorrhage) a result of the direct interaction of glial cells and the adjacent endothelial cells? If direct, is the communication established through junctions or through secreted molecules?<br /> b. The authors do not exclude the possibility that the effects observed on endothelial cells (quantified as length of central artery) could be secondary to the phenotype observed with deletion of glial beta-pix. For instance, can glial beta-pix regulate angiogenic factors secreted by peri-vascular cells, which consequently regulate the length of the central artery or vascular integrity?<br /> c. The pictorial summary of the findings (Figure 7) does not include Zfhx or Vegfa. The data do not provide clarity on how these molecules contribute (directly or indirectly) to endothelial cell integrity. Vegfaa is expressed in the central artery, but the expression of the receptor in these endothelial cells is not shown. Similarly, all other experimental analyses for Zfhx and Vegfa expression were performed in glial cells. More experimental evidence is necessary to show the regulation of angiogenesis (of endothelial cells) by glial beta-pix. Is the Vegfaa receptor present on central arteries, and how does glial depletion of beta-pix affect its expression or response of central artery endothelial cells (both pertaining to angiogenesis and vascular integrity).

      (2) Microtubule stabilization via glial beta-pix, claimed in Figure 5M, is unclear. Magnified images for h-betapix OE and h-stmn-1 glial cells are absent. Is this migration regulated by beta-pix through its GEF activity for Cdc42/Rac?

      (3) Hemorrhages are caused by compromised vascular integrity, which was not measured (either qualitatively or quantitatively) throughout the manuscript. The authors do measure the length of the central artery in several gene deletion models (2I, 3C. 5F/J, 6G/K), which is indicative of artery growth/ angiogenesis. How (if at all) defects in angiogenesis are an indication of hemorrhage should be explained or established. Do these angiogenic growth defects translate into junctional defects at later developmental timepoints? Formation and maintenance of endothelial cell junctions within the hemorrhaging arteries should be assessed in fish with deleted beta-pix from astrocytes.

      (4) More information is required about the quality control steps for 10X sequencing (Figure 4, number of cells, reads, etc.). What steps were taken to validate the data quality? The EC groups, 1 and 2-days post-KO are not visible in 4C. One appreciates that the progenitor group is affected the most 2 days post-KO. But since the effects are expected to be on the endothelial cell group as well (which is shown in in vivo data), an extensive analysis should be done on the EC group (like markers for junctional integrity, angiogenesis, mesenchymal interaction, etc.). Are Stathmins limited to glial cells? Are there indicators for angiogenic responses in endothelial cells?

    1. eLife Assessment

      This useful study provides a spatial transcriptomic analysis of the mouse adrenal gland that could have implications for future research and applications. The authors present solid results that allow the dissection of the cell signalling pathways and cellular composition of different zones of the adrenal glands in the mouse model; they propose new zone-specific gene markers and specific intra- and inter-zonal signaling pathways based on receptor-ligand expression patterns. Their web tool is user-friendly and will be helpful for adrenal scientists; however, the validation of crucial results of the large dataset is necessary. There are also several contradictory results/interpretations, and the opportunity to dissect the sexually dimorphic gene expression pattern and mouse-human interspecies differences is a missed opportunity.

    2. Reviewer #1 (Public review):

      Summary:

      This study employs spatial transcriptomics to explore the molecular architecture of the adult mouse adrenal gland and the adjacent adipose tissue. The research aimed to identify zonation-specific genetic markers, elucidate cellular differentiation patterns, and investigate inter- and intra-zone communication within the adrenal gland. The findings support the centripetal differentiation model, highlighting the transition of cell populations across different cortical zones. The study also integrates ligand-receptor interaction analysis to uncover the adrenal gland's role in endocrine and neuroendocrine signaling, particularly in stress response. This high-resolution spatial transcriptomic map provides novel insights into adrenal gland biology and is a resource for further investigations.

      Strengths:

      The study, using the latest technologies and methods such as Visium CytAssist technology, UMAP & Seurat analysis, Gene Ontology (GO) & KEGG pathway enrichment analysis, Monocle3, and CellChat analysis, performed three-dimensional analysis, which has been challenging to achieve using the two-dimensional transcriptomics that have been commonly used up until now.

      The unique gene expression patterns were demonstrated for each adrenal zone. Spatial transcriptomics confirmed unique gene expression patterns for each adrenal zone (ZG, ZF, ZX, medulla). The centripetal differentiation model shows the migration of the progenitor cells from the adrenal capsule towards the inner cortex. Key genetic markers were identified in each adrenal zone and adjacent adipose tissues. In addition, CellChat analysis identified major signaling pathways, including Wnt signaling, Hedgehog signaling, IGF2-IGF2R interactions, and Neuropeptide Y (NPY) signaling in the medulla. All these results offer a valuable dataset for future adrenal biology research, with potential applications in disease modeling and therapeutic target identification.

      The results, high-resolution mapping of adrenal gland zonation, validation of the centripetal differentiation model, perspective on cell-cell communication, and potential translational impact on human adrenal gland function and disorders, are quite noble.

      Weaknesses:

      The reviewer requests that the following issues be addressed in the text:

      (1) The study focuses only on adult male mice, which limits insights into developmental and sex-specific differences. What do the authors predict about the gender and age difference?

      (2) Despite advanced methodologies, single-cell heterogeneity may not be fully captured, as Visium technology has limited spatial resolution.

      (3) While the study suggests that ZX might have a role in androgen synthesis, further functional validation is required.

      (4) The study is primarily descriptive, lacking in-depth mechanistic experiments to validate cell-cell communication interactions. It is quite interesting to suggest cell-cell communication, but the authors are still required to provide some evidence to support it.

      (5) The data supports the conclusions, particularly in validating the centripetal differentiation model using Monocle3 trajectory analysis. However, functional validation experiments (e.g., gene knockout studies) would strengthen the findings, especially regarding ZX function and ligand-receptor interactions.

    3. Reviewer #2 (Public review):

      This study by M. Blatkiewicz et al. seeks to define the spatial gene expression pattern of the adult male mouse adrenal gland using current spatial transcriptomic techniques. They propose new zone-specific gene markers and specific intra- and inter-zonal signaling pathways based on receptor-ligand expression patterns. Their web tool is user-friendly and will be helpful for adrenal scientists. The manuscript is easy to follow, but validation of crucial results of the large dataset is missing. There are also several contradictory results/interpretations, and the opportunity to dissect the sexually dimorphic gene expression pattern and mouse-human interspecies differences is a missed opportunity.

      (1) The authors used 10-week-old CD1 male mouse adrenal glands to assess the spatial transcriptomics of the adrenal gland. As they also mentioned, male mice typically lose their zone-X after puberty (around 6-8 weeks of age). However, their analysis in 10-week-old mice suggests that zone-X covers most of the adrenal cortex. As shown in Figure 3A, the dots between the zona glomerulosa and the medulla are mostly positive for zone-X, which would suggest that the zona fasciculata represents a relative minority of the overall adult adrenal cortex. Is this correct? Is the presence of zone-X in sexually mature adult male mice unique to the CD1 strain? Providing histology data in support of this conclusion, using zone-specific markers combined with RNA in situ hybridization or immunofluorescence techniques in the CD1 male adrenal gland, would help to interpret these data further. Given the relatively low resolution of their gene expression profiles, it is possible there is overlap between the zona fasciculata and the zone-X.

      (2) The pseudotime trajectory analysis confirms prior reports in the literature showing zonal transdifferentiation but does not provide novel insight. It would be nice to know what gene expression patterns correlate (positively or negatively) based on an unbiased analysis.

      (3) The authors suggest that they identified new zonal markers, but it would be nice to see confirmation of some of these markers (e.g., Frmpd4, Oca2, Sphkap for the ZG or Cited1, Nat8f5 for the ZF, etc. ) with in situ or immunofluorescence combined with known markers such as Dab2, Cyp11b2, or Cyp11b1.

      (4) The authors mention a gradual transition between the zones. It would be interesting to know whether transition zones exist between the zona glomerulosa and the zona fasciculata or the zona fasciculata and the zone-X.

      (5) The authors note using Visium cyst assist, but they do not discuss the advantages of this system compared to other systems. Explanation of the approximate resolution of their analysis (e.g., how many cells were pooled in the wells) would help readers to interpret their data. It would also be nice to compare it to other spatial transcriptomic analyses of human adrenals, given the differences between the zonation of human and mouse adrenals.

      (6) Interestingly, CellChat analysis suggests possible communication between the medulla and the zona fasciculata and zona glomerulosa. How do the authors explain the transfer of these molecules from the medulla to the outer zones given centripetal blood flow in the adrenal? Also, how does the fact that Igf2 expression has been shown to be expressed in the capsule (PMID: 22266195) affect the interpretation of their data?

      (7) The study misses the opportunity to dissect sexually dimorphic gene expression patterns in the mouse adrenal. For example, the authors could have focused on the role of stem cells between male and female mouse adrenals, which have been reported to differ (PMID: 31104943). In addition, the authors could have focused on the sexually dimorphic zone-X and its regulation by sex hormone signaling.

      (8) The capsule is classified as a connective tissue, which may be misleading given its important role as a signaling center in the adrenal. Genes enriched in typical connective tissues do not include many of the genes that seem to define the adrenal capsule. Also, some of the capsule markers appear to be found in the zona glomerulosa. Is this a result of not being able to fully resolve the small layer of zG cells and the even smaller layer of capsular cells? Guided reclustering of the cells based on known markers and separation of capsule and connective tissue might help to present their data on adrenal zonation more clearly.

    4. Reviewer #3 (Public review):

      Summary:

      In summary, the scientists used Visium spatial transcriptomics technology to create a thorough spatial transcriptomic atlas of the adult male mouse adrenal gland and the adipose tissues that surround it. Their primary goals were to map the cell communication network, determine the differentiation direction of various cell types, and find marker genes for various adrenal zones.

      Strengths:

      (1) Undoubtedly, one of the biggest strengths of the manuscript is a spatial transcriptomic o mouse adrenal gland tissue, which, to my knowledge, has not been done before.

      (2) Comprehensive Zonal Characterization: Seven distinct clusters were identified, corresponding to known anatomical and functional regions (ZG, ZF, ZX, medulla, connective tissue, brown and white adipose tissue), each with robust marker gene sets.

      (3) The authors manage to integrate advanced bioinformatical tools such as CellChatDB, Monocle3, and CARD to study the relationship between cell types and differentiation of the tissue.

      (4) The authors manage to identify novel marker genes for some adrenal zones.

      Weaknesses:

      (1) The study focused only on one adult male CD1 IGS mouse, which is a limiting factor for other strains, ages, or females, especially given the sexual dimorphism of the ZX. Although the authors claim that four slices of the adrenal gland have been processed on Visium and sequenced, for "clarity," they show only one, which might bias the results.

      (2) Lack of detailed QC analysis of the Visium slide.

      (3) The study misses the functional validation of the novel marker genes - this needs to be addressed.

      (4) What worries me a lot is the fact that, actually, there might be more than one cell present within a Visium spot, so the only way to define zones is by anatomical observation rather than cellular composition.

      (5) In cell chat analysis, the authors show the strength of the interactions, but miss out on the number of interactions.

      Conclusions:

      The authors' stated goals were mostly accomplished:

      By mapping the mouse adrenal gland's molecular landscape, they were able to clearly establish unique molecular signatures for every anatomical zone.

      Pseudotime study of the cell progression from the capsule through ZG, ZF, and ZX demonstrates that the data strongly support the centripetal differentiation concept. Conclusions on the functional importance of newly discovered marker genes are conjectural and need additional experimental support.

      Nevertheless, several findings are still tentative and will need more experimental support, especially when it comes to the significance of ZX persistence and the functional involvement of recently discovered marker genes.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decision-making when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented.

      (6) Some figures are not clear (see Figure 4 A & B).

      We will be trying to improve the quality of this image in the next version of the manuscript.

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

      The conflict-of-interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.

      Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      Weaknesses:

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed.

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors. We will be including a more explicit discussion of the limitations of SSF in urban environmental settings with human participants in the next version of the manuscript.

    1. eLife Assessment

      This valuable study asks how the neural representation of individual finger movements changes during the early periods of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide solid evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The authors also show that offline contextualization during short rest periods is the basis for improved performance. Further confirmation of these results on multiple movement sequences would further strengthen the key claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements, and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows from a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

    3. Reviewer #2 (Public review):

      Summary:

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%).

      In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

      Strengths:

      The use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. The finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

      Weaknesses:

      One potential weakness, in terms of the generality, is that the study assessed the single sequence, the "41324" across all participants. Future confirmation test of using different sequences would be important.

    4. Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training, and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design, and which are described below, question the neurobiological implications proposed by the authors, and offer a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence casts doubt on this assumption.

      Specifically:

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence, and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 3 - supplement 5 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least {plus minus}100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution. The authors also reported that there was only a weak relation between inter-press intervals and "online contextualization" (Figure 5 - figure supplement 6), however, their analysis suprisingly includes a keypress transition that is shared between OP1 and OP5 ("4-4"), rather than focusing solely on the two distinctive transitions ("2-4" and "4-1").

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time, and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. During the review process, authors pointed at absence of evidence of a relation between tapping speed and "ordinal coding" (Figure 5 - figure supplement 7). However, a rigorous test of the idea that the mental representation of context changes would require a task design in which the physical context remains constant.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence, but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses.

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. While the authors report the surprising finding that their eye-tracking data could not predict asterisk position on the task display above chance level, the mean gaze position seemed to vary systematically as a function of ordinal position of a movement - see Figure 4 - figure supplement 3.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, to reach the conclusion that "the degree of representational differentiation -particularly prominent over rest intervals - correlated with skill gains.", the critical question is rather whether "offline differentiation" correlates with micro-offline gains (not with cumulative micro-offline gains). That is, does the degree to which representations differentiate "during" a given rest period correlate with the degree to which performance improves from before to after the same rest period (not: does "offline differentiation" in a given rest period correlate with the degree to which performance has improved "during" all rest periods up to the current rest period - but this is what Figure 5 - figure supplements 1 and 4 show).

      The authors follow the assumption that micro-offline gains reflect offline learning. However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

      Along these lines, the authors argue that their practice schedule "minimizes reactive inhibition effects", in particular their short practice periods of 10 seconds each. However, 10 seconds are sufficient to result in motor slowing, as report in Bächinger et al., elife 2019, or Rodrigues et al., Exp Brain Res 2009.

      An important conceptual problem with the current study is that the authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods. However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).

      The authors' conclusion that "low-frequency oscillations (LFOs) result in higher decoding accuracy compared to other narrow-band activity" should be taken with caution, given that the critical decoding analysis for this conclusion was based on data averaged across a time window of 200 ms (Figure 2), essentially smoothing out higher frequency components.

  2. Jun 2025
    1. eLife Assessment

      The microbiome field is constantly providing insight on various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions, as well as properties with therapeutic implications, will likely remain a fruitful field for decades to come. In this valuable study, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from Salmonella enterica infection. The authors provide compelling evidence identifying gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

    2. Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      Weaknesses:

      No major weaknesses noted.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      No major weaknesses noted.

      We gratefully appreciate your positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have mainly two questions for this work.

      Main point-1:

      The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where were these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, previously, more details had been added in Materials and methods section in the revised manuscript (see Line 482-493) (Manuscript with marked changes are related to “Related Manuscript File” in submission system). We gratefully appreciate your professional comments.

      Line 482-493: “Lactic acid bacteria (LAB) and Enterococcus strains were isolated from 39 samples: 33 fermented yoghurts samples (collected from families in multiple cities of China, including Lanzhou, Urumqi, Guangzhou, Shenzhen, Shanghai, Hohhot, Nanjing, Yangling, Dali, Zhengzhou, Shangqiu, Harbin, Kunming, Puer), and 6 healthy piglet rectal content samples without pathogen infection and diarrhea in pig farm of Zhejiang province (Table 1). Ten isolates were randomly selected from each sample. De Man-Rogosa-Sharpe (MRS) with 2.0% CaCO<sub>3</sub> (is a selective culture medium to favor the luxuriant cultivation of Lactobacilli) and Brain heart infusion (BHI) broths (Huankai Microbial, Guangzhou, China) were used for bacteria isolation and cultivation. Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) method was employed to identify of bacterial species with a confidence level ≥ 90% (He et al., 2022).”

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-133). We gratefully appreciate your professional comments.

      Line 129-133: “After identified by MALDI-TOF MS, a total of 290 bacterial isolates were isolated and identified from 33 fermented yoghurts samples and 6 healthy piglet rectal content samples. Those isolates consist of 63 Streptococcus isolates, 158 Lactobacillus/Lacticaseibacillus/Limosilactobacillus isolates, and 69 Enterococcus isolates (Figure 1A, Table 1).”

      Main-point-2:

      As probiotics, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strain such as 53103.

      I am sure the authors are also interested to know if P118 is better as a probiotics candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works. Nonetheless, the door open for future research had been left in Conclusion section (see Line 477-479) “Further investigations are needed to assess whether the mechanisms observed in P118 are strain-specific or broadly applicable to other L. rhamnosus strains, or LAB species in general.”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      This reviewer appreciates the efforts from the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way which is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-133, Line 140-143, Line 325-328, Line 482-493, Line 501-502, Line 663-667, Line 709-710, Line 1003-1143). We gratefully appreciate your valuable suggestions.

      For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or cite other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 501-710).

      Another example: the figures have great resolution, but they are way too busy. The figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy for us to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1003-1024, Line 1056-1075). We gratefully appreciate your professional comments and valuable suggestions.

      More minor comments:

      line 30: spell out "C." please.

      Done as requested (see Line 29, Line 31). We gratefully appreciate your valuable suggestions.

    1. eLife Assessment

      This valuable study identifies a novel bacteriophage that can use the exopolysaccharide Psl of Pseudomonas aeruginosa to infect and disrupt biofilms. The work is convincing and suggests a novel approach to control biofilms that is relevant to researchers working on biofilms, specifically in Pseudomonas, on phage physiology and discovery, and on alternatives to controlling bacterial pathogens.

    2. Reviewer #1 (Public review):

      Summary:

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6 and -10, that were unable to infected the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase render the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage - PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters.

      Strengths:

      The experiments carried out in this manuscript are thoughtful and rational, and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlights the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures.

      Weaknesses:

      The authors did not include host-range testing or resistance development in this study, which would have strengthened the paper. Additionally, further characterisation of the CLEW-1 interaction with PSL at the molecular level would also have been welcomed. However, this will likely be the subject of future studies.

    3. Reviewer #2 (Public review):

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria.

      Comments on revised version:

      The authors have generally responded well to the reviewers' comments. This has served to improve this manuscript that has identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa as a receptor.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]. It turns out that the Clew phage are highly related, which is highlighted by the genomic comparison in the supplementary figure S1. It therefore made sense to focus our in-depth analysis on one of the phage. We have included a supplementary figure (S1A), demonstrating that the other Clew phage also require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This is now mentioned in the discussion.  

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      First off, I would like to congratulate the authors on this study and manuscript. It is very well executed and the writing and flow of the paper are excellent. The findings are intriguing and I believe the paper will be very well received by both the phage, Pseudomonas, and biofilm communities. 

      Thank you for your kind review of our work!

      I have very little to critique about the paper but I have listed a few suggestions that I believe could strengthen the paper if corrected: 

      Comments and suggestions: 

      (1) The paper initially describes 4 isolated phages but no rationale is given for why they chose to continue with CLEW-1, as opposed to CLEW-3, -6, and -10. The paper would benefit from going into more detail with phage genomics and perhaps characterize the phage receptor binding to PSL. 

      Clew-1, -3, -6, and -10 are actually quite similar to one another. The genomes are now uploaded to Genbank [accession# PQ790658.1, PQ790659.1, PQ790660.1, and PQ790661.1]. They all require an intact Psl locus for infection, we have updated Fig. S1 to show this for the remaining Clew phage. In the end, it made sense to focus on one of these related phage and characterize it in depth.

      (2) PA14 was used in some experiments but not listed in the strain table. 

      Thank you, this has been added in the resubmission.

      (3) Would have been good to see more strains/isolates used.

      We are currently characterizing the host range of Clew-1. It appears to be pretty limited, but this will likely be included in another paper that will focus on host range, not only of Clew-1, but other biofilm-tropic phage that we have isolated since then.

      (4) Could purified PSL be added to make non-PSL strain (like PA14) susceptible? 

      We have tried adding purified Psl to a psl mutant strain, but this does not result phage sensitivity. Further characterization of the Psl receptor, is something we are currently working on, but will likely be a much bigger story than can be easily accommodated in a revised manuscript.

      (5) No data on resistance development. 

      We have not done this as yet.

      (6) Alternative biofilm models. Both in vitro and in vivo. 

      We agree that exploring the interaction of Clew-1 with biofilms in greater detail is a logical next step. The revised manuscript does have data on the viability of P. aeruginosa biofilm bacteria after Clew-1 infection using either a bead biofilm model or LIVE/DEAD staining of static biofilms. However, expanding on this further (setting up flow-cell biofilms, developing reporters to monitor phage infection, etc.) is beyond the scope of this initial report and characterization of Clew-1.

      (7) There is a mistake in at least one reference. An unknown author is listed in reference 48. DA Garsin is not part of the paper. Might be worth looking into further mistakes in the reference list as I suspect this might be an issue related to the citation software.

      Thank you. Yes, odd how that extra author got snuck in. This has been corrected.

      (8) I don't seem to be able to locate a Genbank file or accession number. If it wasn't performed how was evolutionary relatedness data generated?

      The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]

      (9) No genomic information about the isolated phages. Are they temperate or virulent? This would be important information as only strictly lytic phages are currently deemed appropriate for phage therapy. 

      These phage are virulent. We have only been able to isolate resistant bacteria from plaques, but they do not harbor the phage (as detected by PCR). This matches what other researchers have found for Bruynogheviruses.

      Reviewer #2 (Recommendations for the authors): 

      Others have used different PA mutants lacking known phage receptors to pan for new phages. However, it is not totally clear how the screen here was selected for the Psl-specific phage. The authors used flagella and pili mutants and found Clew-1, -3, -6, and -10. These were all Bruynogheviruses. They also isolated a phage that uses the O antigen as a receptor. The family of this latter phage and how it is known to use this as a receptor is not described. 

      Phage Ocp-2 is a Pbunavirus. We added new supplementary figure S3, addressing the O-antigen receptor.

      The authors focused on Clew-1, but the receptor for these other Clew phages is not presented. For Clew-1 the phage could plaque on the fliF deletion mutant but not the wild-type strain. The reason for this never appears to be addressed. The authors leap to consider the involvement of c-di-GMP, but how this relates to fliF appears to be lacking. 

      We have included a supplementary figure demonstrating that all the Clew phage require Psl for infection (Fig. S1A). As noted above, we have uploaded the genomic data that underpins the comparison in our supplementary figure. The phage are all closely related. It therefore made sense to focus on one of the phage for the analysis.  

      It is particularly unclear why this phage doesn't plaque on PAO1 as this strain does make Psl. Related to this, it actually looks like something is happening to PAO1 in Figure S4 (although what units are on the x-axis is not entirely clear).

      We hypothesize that the fraction of susceptible cells in the population dictates whether the phage can make overt plaques. The supplementary figure S4 indicates that a subpopulation of the wild-type culture is susceptible and this is borne out by the fraction of wild type cells that the phage can bind to (~50%). The fliF mutation increases this frequency of susceptible cells to 80-90% (Fig. 3).

      The Tnseq screen to identify receptors is clever and identifies additional phosphodiesterase genes, the deletion of which makes PAO1 susceptible. And the screen to find resistant fliF mutants identified genes involved in Psl. However, the link between the phosphodiesterase mutants and the amount of Psl produced never appears to be established. And the statement that Psl is required for infection (line 130) is never actually tested.

      The link between c-di-GMP and Psl production is well-established in the literature. I think the requirement for Psl in infection is demonstrated multiple ways, including lack of plaque formation on psl mutant strains and lack of phage binding to strains that do not produce Psl, direct binding of the phage to affinity purified Psl.

      Figure 2C describes using a ∆fliF2 strain but how this is different (or if it is different) from ∆fliF described in the text is never explained.

      The difference in the deletions is explained in table S1, in the description for the deletion constructs used in their construction, pEXG2-∆fliF and pEXG2-∆fliF2 (∆fliF2 is smaller than ∆fliF and can be complemented completely with our complementing plasmid, pP37-fliF, which is the reason why we used the ∆fliF2 mutation going forward, rather than the ∆fliF mutation on which the phage was originally isolated).

      Similarly, there is a sentence (line 138) that "Attachment of Clew-1 is Psl-dependent" but this would appear to have no context.

      The relevant figure, Fig. 3, is cited in the next sentence and is the subject of the remaining paragraphs in this section of the manuscript.

      For Figure 3B, why wasn't the single ∆pslC mutant visualized in this analysis? Similar questions relate to the data in Figure 4.

      Analyzing the effect of the pslC deletion in the context of the ∆fliF2 mutant background, which is more permissive for phage infection, is the more stringent test.  

      The efficacy of Clew-1 in the mouse keratitis model is intriguing but it is unclear why the CFU/eye are so variable. The description of how the experiment was actually carried out is not clear. Was only one eye scratched or both? Were controls included with a scratch and no bacteria ({plus minus} phage)?

      One eye was infected. We did not conduct a no-bacteria control (just scratching the cornea is not sufficient to cause disease). The revised manuscript has an updated animal experiment in which we carried the infection forward to 72h with two phage treatments. Following this regiment, there is a significant decrease in CFU, as well as corneal opacity (disease). Variability of the data is a fairly common feature in animal experiments. There are a number of factors, such as does the mouse blink and remove some of the inoculum shortly after deposition of the bacteria or the phage after each treatment that could explain this variability.

    1. eLife Assessment

      This useful study analyzed 335 Mycobacterium tuberculosis Complex genomes and found that MTBC has a closed pangenome with few accessory genes. The research provides solid evidence for gene presence-absence patterns which support the appending conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

    3. Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Within the analysis we undertook we did look at paralogous blocks in pangraph, based on copy number per genome. However, this could have been clearer in the text and we will rectify this. We also focussed on duplicated/deleted blocks that were present in two of more sub-lineages. This is noted in figure 4 legend but we will make this clearer in other sections of the manuscript.

      We agree that indeed the way paralogs are handled could still be optimised, and that gene duplicates of some genes could have biological importance. The reviewer is suggesting that a synteny analysis between genomes would be best for finding specific regions that are duplicated/deleted within a genome, and if those sections are duplicated/deleted in the same regions of the genome. Since Pangraph does not give such information readily, a larger amount of analysis would be required to confirm such genome position-specific duplications. While this is indeed important, we deem this to be out of scope for the current publication, but will note this as a limitation in the discussion. However, this does not fundamentally change the main conclusions of our analysis.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph.

      Strengths:

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis.

      Weaknesses:

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable.

      We have now added a robust overview of the dataset to supplementary file 1. This is split into 3 sections: public genomes, which were assembled by others; sequenced genomes, which were created and assembled by us; the BUSCO information for all the genomes together. We did not assemble any public data ourselves but retrieved these from elsewhere. We have modified the text to be more specific on this (Line 114 onwards) and the supplementary file is updated to better outline the data.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided.

      We have now included an analysis where we looked to see if the sequencing platform influenced the resulting accessory genome size and the pseudogene count. The details of this are included in lines 207-219, and the results are outlined in lines 251-258. Essentially, we found no correlation between sequencing platform and genome characteristics, although less stringent cut-offs did suggest that PacBio SMRT-only assembled genomes may have larger accessory genomes. We do not believe this is enough to influence our larger inferences from this data. It should be noted that complete genomes, in general, give a better indication of pangenome size compared to draft genomes, as has been shown previously (e.g. Marin et al., 2024). Even with some small potential bias, this makes our analysis more robust than any previously published.

      In relation to the sequencing depth of our own data, all genomes had coverage above 30x, which Sanderson et al. (2024) has shown to be sufficient for highly accurate sequence recovery. We fixed an issue with the L9 isolate from the previous submission, which resulted in a better BUSCO score and overall quality of that isolate and the overall dataset.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done. It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies.

      We apologise for the ambiguity in the methodology. All the isolates were inputted to Pangraph to create the pangenome using this method. This is now made clearer in lines 175-177. Standard pangenome statistics (size, genome fluidity, etc.) derived from this Pangraph output are now present in the results section as well (lines 301-320).

      We then only looked at regions of difference at the sub-lineage level, meaning we grouped genomes by sub-lineage within the resulting graph and looked for blocks common between isolates of the same sub-lineage but absent from one or more other sub-lineages. We did this from both the Panaroo output and the Pangraph output and then retained only blocks found by both. The results of this are now outlined in lines 351-383.

      We focussed on these sub-lineage-specific regions to focus on long-term evolution patterns and not be influenced by single-genome short-term changes. We do not have enough genomes of closely related isolates to truly look at very recent evolution, although the small accessory genome indicates this is not substantial in terms of gene presence/absence. We also did not want potential mis-annotations in a single genome to heavily influence our findings due to the potential issues pointed out by the reviewer above. We state this more clearly in the introduction (lines 106-108), methods (lines 184-186) and results (345-347), and we indicate the limitations in the Discussion, lines 452-457 and 471-473. We also changed the title to ‘shaped’ instead of ‘driven by’.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step.

      We apologise for the confusion here as indeed the RDs terminology is very MTBC-specific. Current RDs are always relevant to H37Rv, as that is how original discovery of these regions was done and that is how RDScan works. We clarify this in the introduction (lines 67-68). If we found a large sequence polymorphism (e.g. by Pangraph) and searched for known RDs using RDScan, we then assigned a current RD name to this LSP. This uses H37Rv as a reference. If we did not find a known RD, we then classified the LSP as a new RD if it is present in H37Rv, or left the designation as an LSP if not in H37Rv, thus expanding the analysis beyond the H37Rv-centric approaches used by others previously. This is hopefully now made clearer in the methods, lines 187-194.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail.

      We have amended the introduction to explain this terminology better (lines 67-68). Naming of the RDs here came from using RDScan to assign current names to any accessory regions we found and if such a region was not a known RD, we gave it a lineage-related name, allowing for proper RD naming later (lines 187-194). Because the Bespiatyk paper is the basis for RDScan, our work implicitly compares to this throughout, as any RDs we find which were not picked up by RDScan are thus novel compared to that paper.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC.

      We have now amended the analysis to specifically designate a structural variant as a deletion if present in the majority of strains and absent in a minority, or an insertion/duplication if present in a minority and absent in a majority (lines 191-192). We also ran Panaroo without merging paralogs to examine duplication in this output; Pangraph implicitly includes paralogs already.

      From all these analyses we did not find any structural variants classed as insertions/duplications and did not find paralogs to be a major feature at the sub-lineage level (lines 377-383). While these features could be important on shorter timescales, we do not have enough closed genomes to confidently state this (limitation outlined in lines 452-457). Therefore, our assertion that deletions are a primary force shaping the long-term evolution in this group still holds.

      Reviewer #2 (Public Review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed.

      Weaknesses:

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this.

      We have included a Heaps law and genome fluidity calculation for each pangenome estimation to demonstrate that the pangenome is closed. This is detailed in lines 225-228 with results shown in lines 274-278 and 316- 320 and Supplementary Figure 2. We agree that more closely related genomes would benefit a future version of this analysis and indicate we indicate the limitations in the Discussion, lines 452-457 and 471-473.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      l. 24, "with distinct genomic features". I'm not sure what you are referring to here.

      We refer to the differences in accessory genome and related functional profiles but did not want to bloat the abstract with such additional details

      Introduction

      l. 40, "L1 to L9". A lineage 10 has been described recently: https://doi.org/10.3201/eid3003.231466.

      We have updated the text and the reference. Unfortunately, no closed genome for this lineage exists so we have not included it in the analyses. We note this in the results, like 232

      l.62/3, "caused by the absence of horizontal gene transfer, plasmids, and recombination". Recombination is not absent in the MTBC, only horizontal gene transfer seems to be, which is what the cited studies show. Indeed a few sentences later homologous recombination is mentioned as a cause of deletions.

      This has now been removed from the introduction

      l. 67, "within lineage diversity is thought to be mostly driven by SNPs". Again I'm not sure what is meant here with "driven by". Point mutations are probably the most common mutational events, but duplications, insertions, deletions, and gene conversion also occur and can affect large regions and possibly important genes, as shown in a recent preprint (https://doi.org/10.1101/2024.03.08.584093).

      We have changed the text to say ‘mostly composed of’. While indeed other SNVs may be contributing, the prevailing thought at lineage level is that SNPs are the primary source of diversity. The linked pre-print is looking at within transmission clusters and this has not been described at the lineage level, which could be done in a future work.

      l. 100/1. "that can account for variations in virulence, metabolism, and antibiotic resistance". I would phrase this conservatively since the functional inferences in this study are speculative.

      This has now been tempered to be less specific.

      Methods

      l. 108. That an assembly has a single contig does not mean that it is "closed". Many single contig assemblies on NCBI are reference-guided short-read assemblies, that is, fragments patched together rather than closed assemblies. The same could be true for long-read assemblies.

      We specifically chose those listed as closed on NCBI so rely on their checks to ensure this is true. We have stated this better in the paper, line 117.

      l. 111. From Supplementary Table 1 understand that for many genomes only the reads were available (no ASM number). Did you assemble these genomes? If yes, how? The assembly method is not indicated in the supplement, contrary to what is written here.

      All public genomes were downloaded in their assembled forms from the various sources. This is specified better in the text (line 118) and the supplementary table 1 now lists the accessions for all the assemblies.

      l. 113. How many assemblies passed this threshold? And is BUSCO actually useful to assess assembly quality in the MTBC? I assume the dynamic, repetitive gene families that cause problems for assembly and mapping in TB (PE, PPE, ESX) do not figure in the BUSCO list of single-copy orthologs.

      All assemblies passed the BUSCO thresholds for high-quality genomes as laid out in Supplementary Table 1. While indeed this does not include multi-copy genes such as PE/PPE we focussed on regions of difference at the sub-lineage level where two or more genomes represent that sub-lineage. This means any assembly issues in a single genome would need to be exactly the same in another of the same sub-lineage to be included in our results. Through this, we aimed to buffer out issues in individual assemblies.

      l. 147: Why is Panaroo used with -merge-paralogs? I understand that near-identical genes may not be too interesting from a functional perspective, but if the aim of the analysis is to make broad claims about processes driving genome evolution, paralogs should be considered.

      We chose to do so with merged paralogs to look for larger patterns of diversity beyond within-genome paralogs. Additionally, this was required to build the core phylogenetic tree. However, as the reviewer points out, this may bias our findings towards deletions and away from duplications as a primary evolutionary force.

      We repeated this without the merged paralogs option and indeed found a larger pangenome, as outlined in Table 1. However, at the sub-lineage level, this did not result in any new presence/absence patterns (lines 381-383). This means the paralogs tended to be in single genomes only. This still indicates that deletions are the primary force in the longer-term evolution of the complex but indeed on shorter spans this may be different.

      l. 153: remove the comment in brackets.

      This has been fixed and the proper URL placed in instead.

      l. 159: which genomes, and why those?

      This is now clarified to state all genomes were used for this analysis.

      l. 161, "gene blocks": since this analysis is introduced as capturing the non-coding part of the genome, maybe just call them "blocks"?

      All references to gene blocks are now changed to genomic blocks to be more specific.

      l. 162: what happens with blocks that yield no hits against RvD1, TbD1, and H37Rv?

      We named these with lineage-specific names (supplementary table 4) but did not assign RD names specifically.

      l. 164: where does the information about the regions of difference come from? How exactly were these regions determined?

      Awe have expanded this section to be more specific on the use of RDScan and new naming, along with how we determine if something is an RD/LSP.

      Results

      l. 185ff: This paragraph gives many details about the geographic origin of the samples, but what I'd expect here is a short description of assembly qualities, for example, the results of the BUSCO analysis, a description of your own Nanopore assemblies, or a small analysis of the number of indels/pseudogenes relative to sequencing technology or coverage (see comment in the public review).

      This section (lines 231-258) has been expanded considerably to give a better overview of the dataset and any potential biases. Supplementary table 1 has also been expanded to include more information on each strain.

      l. 187, "324 genomes published previously": 322 according to the methods section.

      The number has been fixed throughout to the proper total of public genomes (329).

      l. 201: define the soft core, shell, and cloud genes.

      This is now defined on line 262

      l. 228, "defined primarily by RD105 and RD207 deletions": this claim seems to come from the analysis of variable importance (Factoextra), which should be made clear here.

      This has been clarified on line 333.

      l. 237, "L8, serving as the ancestor of the MTBC": this is incorrect, equivalent to saying that the Chimpanzee is the ancestor of Homo sapiens.

      We have changed this to basal to align with how it is described in the original paper.

      l. 239, "The accessory genome of the MTBC". It is a bit confusing that the same term, 'accessory genome', is used here for the graph-based analysis, which is presented as a way to look at the non-coding part of the genome.

      We have clarified the terminology on line 347 and improved consistency throughout.

      l. 240/1, "specific to certain lineages and sublineages". What exactly do you mean by "specific" to? Present only in members of a certain lineage/sublineage? In all members of a certain lineage/sublineage? Maybe an additional panel in Figure 5, showing examples of lineage- and sublineage-specific variants, would help the reader grasp this key concept.

      We have clarified this on line 349 and the legend of what is now figure 4.

      l. 241/2, "82 lineage and sublineage-specific genomic regions ranging from 270 bp to 9.8 kb". Were "gene blocks" filtered for a minimum size, or why are there no variants smaller than 270 bp? A short description of all the blocks identified in the graph could be informative (their sizes, frequencies ...).

      Yes, a minimum of 250bp was set for the blocks to only look at larger polymorphisms. This is clarified on line 177 and 304.

      A second point: It is not entirely clear to me what Figure 6 is showing. Are you showing here a single representative strain per sublineage? Or have you somehow summarized the regions of difference shown in Figure 5 at the sublineage level? What is the tree on the left? This should be made clear in the legend and maybe also in the methods/results.

      In figure 4 (which was figure 6), because each RD is common to all members of the same sub-lineage, we have placed a single branch for each sub-lineage. This is has been clarified in the legend.

      l. 254, "this gene was classified as being in the core genome": why should a partially deleted gene not be in the core genome?

      You are correct, we have removed that statement.

      l. 258/259, "The Pangraph alignment approach identified partial gene deletion and non-coding regions of the DNA that were impacted by genomic deletion". I do not understand how you classify a structural variant identified in the pangenome graph as a deletion or an insertion.

      This has been clarified as relative to H37Rv, as this is standard practice for RDs and general evolutionary analyses in MTBC, as outlined above.

      l. 262/263 , "the accessory genome of the MTBC is small and is acquired vertically from a common ancestor within the lineage". If deletion is the main process involved here, "acquired" seems a bit strange.

      We agree and changed the header to better reflect the discussion on mis-annotation issues

      Figure 1: Good to know, but not directly relevant for the rest of the paper. Maybe move it to the supplement?

      This has been moved to Supplementary figure 1

      Figure 2: the y-axis is labeled 'Variable genome size', but from the text and the legend I figure it should be 'Number of accessory genes'?

      This has been changed to ‘accessory genes’ in Figure 1 (which was figure 2 in previous version).

      Figure 4: too small.

      We will endeavour to ensure this is as large as possible in the final version.

      Discussion

      l. 271, "MTBC accessory genome is ... acquired vertically". See above.

      Changed, as outlined above.

      l. 292, "appeared to be fragmented genes caused by misassemblies". Is there a way to distinguish "true" pseudogenes from misassemblies? This could be a relevant issue for low-coverage long-read assemblies (see public review).

      Not that we are currently aware of, but we do know other groups which are working on this issue.

      l. 300/1, "the whole-genome approach could capture higher genetic variations". Do you mean the graph approach? I'm not sure that comparing the two approaches here makes sense, as they serve different purposes. A pangenome graph is a summary of all genetic variation, while the purpose of Panaroo is to study gene absence/presence. So by definition, the graph should capture more genetic variation.

      This statement was specifically to state that much genetic variation in MTBC is outside the coding genes and so traditional “pangenome’ analyses are actually not looking at the full genomic variation.

      l. 302/3, "this method identified non-coding regions of the genome that were affected by genomic deletions". See the comments above regarding deletions versus insertions. I'd say this method identifies coding and non-coding regions that were affected by genomic deletions and insertions.

      We have undertaken additional analyses to be sure these are likely deletions, as outlined above.

      l. 305: what are "lineage-independent deletions"?

      We labelled these as convergent evolution, now clarified on line 443.

      l. 329: How is RD105 "caused" by the insertion of IS6110? I did not find RD105 mentioned in the Alonso et al. paper. Similarly below, l. 331, how is RD207 "linked" to IS6110?

      The RD105 connection was misattributed as IS6110 insertion is related to RD152, not RD105. This has now been removed.

      RD207 is linked to IS6110 as its deletion is due to recombination between two such elements. This is now clarified on line 486.

      l. 345, "the growth advantage gene group": not quite sure what this is.

      We have fixed this on line 499 to state they are genes which confer growth advantages.

      l. 373ff: The role of genetic drift in the evolution of the MTBC is an open question, other studies have come to different conclusions than Hershberg et al. (this has been recently reviewed: https://doi.org/10.24072/pcjournal.322).

      We have outlined this debate better in lines 527-531

      l. 375/6, "Gene loss, driven by genetic drift, is likely to be a key contributor to the observed genetic diversity within the MTBC." This sentence would need some elaboration to be intelligible. How does genetic drift drive gene loss?

      We have removed this.

      l. 395/6, "... predominantly driven by genome reduction. This observation underlines the importance of genomic deletions in the evolution of the MTBC." See comments above regarding deletions. I'm not convinced that your study really shows this, as it completely ignores paralogs and the processes counteracting reductive genome evolution: duplication and gene amplification.

      As outlined above, we have undertaken additional analyses to more strongly support this statement.

      l. 399, "the accessory genome of MTBC is a product of gene deletions, which can be classified into lineage-specific and independent deletions". Again, I'm not sure what is meant by lineage-independent deletions.

      We have better defined this in the text, line 443, to be related to convergent evolution.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      In lines 120-121, it is mentioned that TB-profiler v4.4.2 was used for lineage classification, but this version was released in February 2023. As I understand there have been some changes (inclusion/exclusion) of certain lineage markers. Would it not be appropriate to repeat lineage classification with a more recent version? This would of course require extensive re-analysis, so could the lineage marker database perhaps also be cited.

      We have rerun all the genomes through TB-Profiler v6.5 and updated the text to state this; the exact database used is also now stated.

      Could the authors perhaps include the sequencing summary or quality of the nanopore sequences? The L9 (Mtb8) sample had a relatively lower depth and resulted in two contigs. Yet one contig was the initial inclusion criteria. It is unclear whether these samples were excluded from some of the analyses. Mtb6 also has relatively low coverage. Was the sequencing quality adequate to accurately identify all the lineage markers, in particular those with a lower depth of coverage? Could a hybrid approach be an inexpensive way to polish these assemblies?

      We reanalysed the L9 sample and, with some better cleaning, got it to a single contig with better depth and overall score. This is outlined in the Supplementary table 1 sheets. While depth is average, it is still above the recommended 30x, which is needed for good sequence recovery (Sanderson et al., 2024). We did indeed recover all lineage markers from these assemblies.

      Recommendations for improving the writing and presentation.

      The introduction is well-written and recent MTBC pangenomic studies have been incorporated, but I am curious as to why this paper was not referred to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922483/ I believe this was the first attempt to study the pangenome, albeit with a different research question. Nearly all previous analyses largely focused on utilizing the pangenome to investigate transmission.

      Indeed this study did look at a pangenome of sorts, but specifically SNPs and not genes or regions. Since the latter is the main basis for pangenome work these days, we chose not to include this paper.

      Minor corrections to the text and figures.

      In line 129, it is explained that DNA was extracted to be suitable for PacBio sequencing, but ONT sequencing was used for the 11 new sequences. Is this a minor oversight or do the authors feel that DNA extracted for PacBio would be suitable for ONT sequencing? It is a fair assumption.

      We apologise, this is a long-read extraction approach and not specific to PacBio. We have amended the text to state this.

      In line 153, this should be removed: (Conor, could you please add the script to your GitHub page?).

      This has been fixed now.

    1. eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including the robustness of the CF marking and manipulation approach and the unclear efficacy of longer-duration climbing fiber activity suppression.

    2. Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weaken the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      Comments on revisions:

      In this revision, the authors provide new data regarding the effect of eNpHR on CF-evoked complex spiking in vivo but fails to address overall concerns showing the functional effect that explains their causal results. Additionally, the paper has a narrow "CF-or-nothing" framing that leaves unanswered the central question of which signal instructs consolidation if CFs do not. Substantial new experiments and tighter logic are required before the work can serve as a definitive test of CF involvement in different memory processes.

    3. Reviewer #3 (Public review):

      Summary:

      The authors attempted to study connections with the inferior olive to the cerebellar cortex and analyze impacts on optokinetic reflex using optogenetics to perturb the pathway. This is a commendable effort as these methods are very challenging due to the location of the inferior olive and recording methods.

      Strengths:

      The authors have shown that climbing fiber activity was altered due to the optogenetic perturbation. They have added an additional figure to show that complex spikes disappear with inhibitory optogenetics and the impacts on behavior are interesting.

      Weaknesses:

      The images provided to show injection region are difficult to see and specific cell types are not co-labeled. The data and strength of the results would benefit from high-resolution images demonstrating selectivity and expression, in particular for Figure 2A and 3A. In addition, while the processed recording data looks very striking, including the raw data, as done in Figure 2, would again support the conclusions.

      One major concern is that the viruses chosen are non-specific to the cell targets and a cre-based approach is lacking to draw conclusions on only the targeted pathway of interest. It is unclear based on the figures provided if the AAVs labeled only the pathway of interest. It would be interesting to know if typical memory acquisition returns in the same animals if inhibition stops and if animal movement was impacted by the perturbation.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminished by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the editors and reviewers for their constructive feedback and careful consideration of our manuscript. Despite their acknowledgment of the potential of our study to yield valuable insights into the role of CF activity in cerebellar learning and its phase-specific involvement, we have meticulously addressed all the methodological concerns raised by providing additional clarifications and explanations in this letter.

      In response to concerns regarding the efficacy of long-term optogenetic inhibition, we conducted additional in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase (Figure 2, lines 112-139). Although stable single-unit recording beyond 40 minutes was not feasible due to technical challenges, the robust suppression of CF-evoked complex spikes we observed during this period (Figure 2, lines 112–139) provides strong evidence that halorhodopsin-mediated inhibition persists over the longer irradiation intervals employed in our behavioral assays.

      Moreover, given that there is a concern regarding the CaMKII promoter also inducing expression in neighboring mossy fibers, potentially affecting simple spike activity, we have presented data in Figure 2C, which illustrates that PC simple spike firing rates remain unchanged during prolonged illumination. This finding confirms that our optogenetic manipulation selectively disrupts CF-mediated complex spikes without influencing mossy fiber to PC transmission. We have elucidated these results further in lines 128 to 136.

      Lastly, we have broadened our Discussion to consider alternative mechanisms of CF involvement in cerebellar learning, including the modulation of molecular layer interneurons (Rowan et al., 2018) and direct CF interactions with vestibular nuclear neurons (Balaban et al., 1981), thereby offering a more comprehensive perspective on the multifaceted role of CF signaling. Specific clarifications regarding these points are articulated from lines 222 to 242 and 243 to 254 in the manuscript. We are confident that these revisions adequately address the reviewers' concerns and further substantiate the specificity and significance of our study findings

      (1) Rowan, Matthew JM, et al. "Graded control of climbing-fiber-mediated plasticity and learning by inhibition in the cerebellum." Neuron 99.5 (2018): 999-1015.

      (2) Balaban, Carey D., Yasuo Kawaguchi, and Eiju Watanabe. "Evidence of a collateralized climbing fiber projection from the inferior olive to the flocculus and vestibular nuclei in rabbits." Neuroscience letters 22.1 (1981): 23-29.

    1. eLife Assessment

      The study introduces new tools for measuring the intracellular calcium concentration close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. This approach yields important new information about the spatial and temporal profile of calcium concentrations near the site of entry at the plasma membrane. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in calcium domains. Some of the conclusions are strongly supported by the data, but a few gaps in the data presented mean that the evidence for other conclusions is incomplete.

    2. Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Strengths

      • The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      • The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results.

      Weaknesses

      • Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      • The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

    3. Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Comments on revisions:

      Specific minor comments:

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-Off-BCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......".

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Strengths:

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Weaknesses:

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results.

    1. eLife Assessment

      The authors take a synthetic approach by introducing synaptic ribbon proteins into HEK cells to analyze how these assemblies cluster calcium channels at the active zone. Using a synapse-naive heterologous expression system and overexpression-based strategy is valuable, as it establishes a promising model for studying molecular interactions at the active zone. The study is built on a solid combination of super-resolution microscopy and electrophysiology, though it currently falls short of replicating the full functional properties of native ribbon synapses and instead resembles a multiprotein complex that partially mimics ribbon-type active zones.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors attempt to reconstitute some active zone properties by introducing synaptic ribbon proteins into HEK cells. This "ground-up" approach can be valuable for assessing the necessity of specfic proteins in synaptic function. Here, the authors co-transfect a membrane-targeted bassoon, RBP2, calcium channel subunits and Ribeye to generate what they call "synthetic ribbons". The resultant structures show an ability to cluster calcium channels (Figure 4B) and a modest ability to concentrate calcium entry locations (figure 7J). At the light level, the ribeye aggregates look spherical and localize to the membrane through its interaction with the membrane-targeted bassoon and at the EM level the structures resemble those observed when Ribeye is overexpressed alone. It is a nice proof-of-principle in establishing a useful experimental system for studying calcium channel localization and with expression of other proteins perhaps a means to understanding structure and function of the ribbon. The paper does establish that previously described protein-interactions can be reconstituted in a heterologous system to and that the addition of Ribeye can increase the size of calcium channel patches via indirect interactions.

      Strengths:

      (1) The authors establish a new experimental system for the study of calcium channel localization to active zones.<br /> (2) The clustering of calcium channels to bassoon via RBP2 is a nice confirmation of a previously-described interaction between bassoon and calcium channels in a cell-based system<br /> (3) The "ground-up" approach is an attractive one and theoretically allows one learn a lot about the essential interactions for building a ribbon structure.<br /> (4) The finding that introducing Ribeye can enhance the size of calcium channel patches is a novel finding that is interesting.

      Weaknesses:

      (1) The addition of EM is welcome, but the structures seem to resemble those created by overexpression of Ribeye alone, albeit at the membrane. It is unclear to me whether the interaction with Bsn or indirect interactions with other proteins has any effect on these structures. Also, while the abstract mentions that the size and shape are similar to ribbons, the EM seems to show that the size and shape are quite variable.<br /> (2) The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. While it is nice to see that it can be reconstituted in a naive cell, the interactions were previously described. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two and the membrane localization of the complexes required introduction of a membrane-anchoring motif. These factors limit the novelty of the findings.<br /> (3) The difference in Ca imaging between SyRibbons and other locations is subtle. While there are reasonable explanations for why this could be the case, it may limit the utility of this system for studying Ca-channel-ribbon dynamics moving forward.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that co-expression of bassoon, RIBEYE, Cav1.3-alpha1, Cav-beta3, Cav-alpha2delta1, and RBP2 in a heterologus system (HEK293 cells) is sufficient to generate a protein complex resembling a presyanptic ribbon-type active zone both in morphology and in function (in clustering voltage-gated Ca channels and creating sites for localized Ca2+ entry). If the 3 separate Cav gene products are taken as a single protein (i.e. a Ca channel), the conclusion is that the core of a ribbon synapse comprises 4 proteins: bassoon holds the RIBEYE-containing ribbon to the plasma membrane, and RPB2 binds to bassoon and Ca channels, tethering the Ca channels to the presynaptic active zone.

      Strengths:

      (1) Good use of a heterologous system with generally appropriate controls provides convincing evidence that a presynaptic ribbon-type active zone (without the ability to support exocytosis), with the ability to support localized Ca2+ entry (a key feature of ribbon-type pre-synapses) can be assembled from a few proteins.<br /> (2) In the revised manuscript, the authors do a good job of addressing the limitations of their cultured cell-system.

      Weaknesses:

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).<br /> (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Initially, I asked why didn't the authors did not try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell, and the authors addressed this question in their revision.<br /> (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

    4. Reviewer #3 (Public review):

      Summary:

      Ribbon synapses are complex molecular assemblies responsible for synaptic vesicle trafficking in sensory cells of the eye and the inner ear. The Ca2+-dependent exocytosis occurs at the active zone (AZ), however, the molecular mechanisms orchestrating the structure and function of the AZs of ribbon synapses are not well understood. To advance in the understanding of those mechanisms, the authors present a novel and interesting experimental strategy pursuing the reconstitution of a minimal active zone of a ribbon synapse within a synapse-naïve cell line: HEK293 cells. The authors have used stably transfected HEK293 cells that express voltage-gated Ca2+ channels subunits (constitutive -CaV beta3 and CaV alpha2 beta1- and inducible CaV1.3 alpha1). They have expressed in those cells several proteins of the ribbon synapse active zone: (1) RIBEYE, (2) a modified version of Bassoon that binds to the plasma membrane through artificial palmitoylation (Palm-Bassoon) and (3) RIM-binding protein 2 (RBP2) to induce the formation of a minimal active zone that they called SyRibbons. The formation of such structures is convincing, however, the evidence of such structures having a functional impact (for example enhancing Ca2+-currents), as the authors claim, is weak. In conclusion, the novel approach shows that expression of a multiprotein complex partially reproduces properties, especially structural properties, of ribbon-type active zones in a heterologous system. Although the approach opens interesting possibilities for further experiments, the evidence supporting the functional properties of the so called "synthetic ribbon synapses" is incomplete.

      Strengths of the study:

      (1) The study is carefully carried out using a remarkable combination of (1) superresolution, correlative light microscopy and cryo-electron tomography, to analyze the formation and subcellular distribution of molecular assemblies and (2) functional assessment of voltage-gated Ca2+ channels using patch-clamp recording of Ca2+-currents and fluorometry to correlate Ca2+ influx with the molecular assemblies formed by AZ proteins. The results are of high quality and are in general accompanied of required control experiments.<br /> (2) The method opens new opportunities to further investigate the minimal and basic properties of AZ proteins that are difficult to study using in vivo systems. The cells that operate through ribbon synapses (e.g. photoreceptors and hair cells) are particularly difficult to manipulate, so setting up and validating the use of a heterologous system more suitable for molecular manipulations is highly valuable.<br /> (3) The structures formed by RIBEYE and Palm-Bassoon in HEK293 cells identified by STED nanoscopy and cryo-electron microscopy share relevant similarities similar to the AZs of ribbon synapses found in rat inner hair cells.

      Weaknesses of the study:

      (1) The evidence of the functional properties of the "synthetic ribbon-type active zones" has been only assessed by its effect on the modulation of Ca2+-channel function, and that effect is rather weak. The authors provide reasonable explanations regarding such a weak effect but, however, it is difficult to conclude that indeed the "synthetic ribbon-type active zones" are bona fide functional multiprotein complexes.

    5. Author response:

      The following is the authors’ response to the original reviews

      Life Assessment

      The authors use a synthetic approach to introduce synaptic ribbon proteins into HEK cells and analyze the ability of the resulting assemblies to cluster calcium channels at the active zone. The use of this ground-up approach is valuable as it establishes a system to study molecular interactions at the active zone. The work relies on a solid combination of super-resolution microscopy and electrophysiology, but would benefit from: (i) additional ultrastructural analysis to establish ribbon formation (in the absence of which the claim of these being synthetic ribbons might not be supported; (ii) data quantification (to confirm colocalization of different proteins); (iii) stronger validation of impact on Ca2+ function; (iv) in depth discussion of problems derived from the use of an over-expression approach.

      We thank the editors and the reviewers for the constructive comments and appreciation of our work. Please find a detailed point-to-point response below. In response to the critique received, we have now (i) included an ultrastructural analysis of the SyRibbons using correlative light microscopy and cryo-electron tomography, (ii) performed quantifications to confirm the colocalisation of the various proteins, (iii) discussed and carefully rephrased our interpretation of the role of the ribbon in modulating Ca<sup>2+</sup> channel function and (iv) discussed concerns regarding the use of an overexpression system. 

      Public Reviews:

      Reviewer #1 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript. We have completely overhauled the manuscript taking the suggestions of the reviewer into account.

      (1) Are these truly "synthetic ribbons". The ribbon synapse is traditionally defined by its morphology at the EM level. To what extent these structures recapitulate ribbons is not shown. It has been previously shown that Ribeye forms aggregates on its own. Do these structures look any more ribbonlike than ribeye aggregates in the absence of its binding partners?

      We thank reviewer 1 for their constructive feedback and critique of the work. 

      We agree that traditionally, ribbon synapses have always been defined by the distinct morphology observed at the EM level. However, since the discovery of the core-components of ribbons (RIBEYE and Piccolino) confocal and super-resolution imaging of immunofluorescently labelled ribbons have gained importance for analysing ribbon synapses. A correspondence of RIBEYE immunofluorescent structures at the active zone to electron microscopy observations of ribbons has been established in numerous studies (Wong et al, 2014; Michanski et al, 2019, 2023; Maxeiner et al, 2016; Jean et al, 2018) even though direct correlative approaches have yet to be performed to our knowledge. We have now analysed SyRibbons using cryo-correlative electron-light microscopy. We observe that GFPpositive RIBEYE spots corresponded well with electron-dense structures, as is characteristic for synaptic ribbons (Robertis & Franchi, 1956; Smith & Sjöstrand, 1961; Matthews & Fuchs, 2010). We could also observe SyRibbons within 100 nm of the plasma membrane (see Fig. 3). We have now added this qualitative ultrastructural analysis of SyRibbons in the main manuscript (lines 272 - 294, Fig. 3 and Supplementary Fig. 3).

      (2) No new biology is discovered here. The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two. Even the membrane localization of the complexes required the introduction of a membraneanchoring motif.

      We respectfully disagree with the overall assessment. Our study emphasizes the synthetic establishment of protein assemblies that mimic key aspects of ribbon-type active zone, defining minimum molecular requirements. Numerous previous studies have described the role of the synaptic ribbon in organising the spatial arrangement of Ca<sup>2+</sup> channels, regulating their abundance and possibly also modulating their physiological properties (Maxeiner et al, 2016; Frank et al, 2010; Jean et al, 2018; Wong et al, 2014; Grabner & Moser, 2021; Lv et al, 2016). We would like to highlight that there remain major gaps between existing in vitro and in vivo data; for instance, no evidence for direct or indirect interactions between Ca<sup>2+</sup> channels and RIBEYE have been demonstrated so far. While we do indeed take advantage of previously known interactions between RIBEYE and Bassoon (tom Dieck et al, 2005); between Bassoon, RBP2 and P/Q-type Ca<sup>2+</sup> channels (Davydova et al, 2014); and between RBP2 and Ltype Ca<sup>2+</sup> channels (Hibino et al, 2002), our study tries to bridge these gaps by establishing the indirect link between the synaptic ribbon (RIBEYE) and L-type CaV1.3 Ca<sup>2+</sup> channels using a bottom-up approach, which has previously just been speculative. Our data shows how even in a synapse-naive heterologous expression system, ribbon synapse components assemble Ca<sup>2+</sup> channel clusters and even show a partial localisation of Ca<sup>2+</sup> signal. Moreover, we argue that the established reconstitution approach provides other interesting insights such as laying ground-up evidence supporting the anchoring of the synaptic ribbon by Bassoon. Finally, we expect that the established system will serve future studies aimed at deciphering the role of putative CaV1.3 or CaV1.4 interacting proteins in regulating Ca<sup>2+</sup> channels of ribbon synapses by providing a more realistic Ca<sup>2+</sup> channel assembly that has been available in heterologous expression systems used so far. In response to the reviewers comment we have augmented the discussion accordingly.  

      (3) The only thing ribbon-specific about these "syn-ribbons" is the expression of ribeye and ribeye does not seem to participate in the localization of other proteins in these complexes. Bsn, Cav1.3 and RBP2 can be found in other neurons.

      The synaptic ribbon made of RIBEYE is the key molecular difference in the molecular AZ ultrastructure of ribbon synapses in the eye and the ear. We hypothesize the ribbon to act as a superscaffold that enables AZ with large Ca<sup>2+</sup> channel assemblies and readily releasable pools. In further support of this hypothesis, the present study on synthetic ribbons shows that CaV1.3 Ca<sup>2+</sup> channel clusters are larger in the presence of SyRibbons compared to SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tetratransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, and RIBEYE, Fig. 6). In response to the reviewers comment we now added an analysis of triple-transfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon), in which CaV1.3 Ca<sup>2+</sup> channel clusters again are significantly smaller than at the SyRibbons and indistinguishable from SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters (Fig. 6E, F).

      (4) As the authors point out, RBP2 is not necessary for some Ca channel clustering in hair cells, yet seems to be essential for clustering to bassoon here.

      Here we would like to clarify that RBP2 is indeed important in inner hair cells for promoting a larger complement of CaV1.3 and RBP2 KO mice show smaller CaV1.3 channel clusters and reduced whole cell and single-AZ Ca<sup>2+</sup> influx amplitudes (Krinner et al, 2017). However, a key point of difference we emphasize on is that even though CaV1.3 clusters appeared smaller, they did not appear broken or fragmented as they do upon genetic perturbation of Bassoon (Frank et al, 2010), RIBEYE (Jean et al, 2018) or Piccolino (Michanski et al, 2023). This highlights how there may be a hierarchy in the spatial assembly of CaV1.3 channels at the inner hair cell ribbon synapse (also described in the discussion section “insights into presynaptic Ca<sup>2+</sup> channel clustering and function”) with proteins like RBP2 regulating abundance of CaV1.3 channels at the synapse and organising them into smaller clusters – what we have termed as “nanoclustering”; while Bassoon and RIBEYE may serve as super-scaffolds further organizing these CaV1.3 nanoclusters into “microclusters”. Observations of fragmented Ca<sup>2+</sup> channel clusters and broader spread of Ca<sup>2+</sup> signal seen upon Ca<sup>2+</sup> imaging in RIBEYE and Bassoon mutants (Jean et al, 2018; Frank et al, 2010; Neef et al, 2018), and the absence of such a phenotype in RBP2 mutants (Krinner et al, 2017) may be explained by such a differential role of these proteins in organising Ca<sup>2+</sup> channel spatial assembly. The data of the present study on reconstituted ribbon containing AZs are in line with these observations in inner hair cells: RBP2 appears important to tether Ca<sup>2+</sup> channels to Bassoon and these AZ-like assemblies are organised to their full extent by the presence of RIBEYE. As mentioned in the response to point 3 of the reviewer, we have now further strengthened this point by adding the analysis of SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tripletransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, Fig. 6E, F). Moreover, we have revised the discussion accordingly. 

      (5) The difference in Ca imaging between SyRibbons and other locations is extremely subtle.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of  SyRibbons and provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerably high expression throughout the membrane even in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B, where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons (for an opposing scenario, please see the cell in Fig. 6B upper panel with very localised CaV1.3 distribution underneath SyRibbons). This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a remarkably big difference in Ca<sup>2+</sup> influx due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). However, it was the spatial spread of the Ca2+ signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca2+ hotspots seen in the wild-type controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters (see also our response to points 3 and 4 of the reviewer): this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      (6) The effect of the expression of palm-Bsn, RBP2 and the combination of the two on Ca-current is ambiguous. It appears that while the combination is larger than the control, it probably isn't significantly different from either of the other two alone (Fig 5). Moreover, expression of Ribeye + the other two showed no effect on Ca current (Figure 7). Also, why is the IV curve right shifted in Figure 7 vs Figure 5?

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. As the reviewer also correctly pointed out, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. Moreover, we would like to thank reviewer 1 for pointing out the right shift in the IV curve which was due to an error in the values plotted on the x-axis. This has been corrected in the updated version of the manuscript. 

      (7) While some of the IHC is quantified, some of it is simply shown as single images. EV2, EV3 and Figure 4a in particular (4b looks convincing enough on its own, but could also benefit from a larger sample size and quantification)

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      Reviewer #2 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).

      We acknowledge this limitation highlighted by the reviewer arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. 

      (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Why didn't the authors try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell?

      This is a valid point for discussion that we also had here extensively. We indeed did consider pheochromocytoma cells (PC12 cells) for reconstitution of ribbon-type AZs and also performed initial experiments with these in the initial stages of the project. PC12 cells offer the advantage of providing synaptic-like microvesicles and also endogenously express several components of the presynaptic machinery such as Bassoon, RIM2, ELKS etc (Inoue et al, 2006) such that overexpression of exogenous AZ proteins would have to be limited to RIBEYE only. 

      However, a major drawback of PC12 cells as a model is the complex molecular background of these cells. We have also briefly described this in the discussion section (line 615 – 619). Naïve, undifferentiated PC12 cells show highly heterogeneous expression of various CaV channel types (Janigro et al, 1989); however, CaV1.3, the predominant type in ribbon synapses of the ear, does not seem to express in these cells (Liu et al, 1996). Furthermore, our attempts at performing immunostainings against CaV1.3 and at overexpressing CaV1.3 in PC12 cells did not prove successful and we decided on refraining from pursuing this further (data not shown). 

      On the contrary, HEK293 cells being “synapse-naïve” provide the advantage of serving as a “blank canvas” for performing such reconstitutions, e.g. they lack voltage-gated Ca<sup>2+</sup> channels and multidomain proteins of the active zone. Moreover, an important practical aspect for our choice was the availability of the HEK293 cell line with stable (and inducible) expression of the CaV1.3 Ca<sup>2+</sup> channel complex. Finally, as described in lines 613 – 614 of the discussion section, even though HEK293 cells lack SVs and the molecular machinery required for their release, our work paves way for future studies which could employ delivery of SV machinery via co-expression (Park et al, 2021) which could then be analyzed by the correlative light and electron microscopy workflow we worked out and added during revision. 

      (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of SyRibbons. Yes, we employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation also in regions without SyRibbons which likely reduced the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      Reviewer #3 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) The results obtained in a heterologous system (HEK293 cells) need to be interpreted with caution. They will importantly speed the generation of models and hypothesis that will, however, require in vivo validation.

      We acknowledge this limitation highlighted by Reviewer 3 arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. We employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation, even in regions without SyRibbons and this could reduce the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      (2) The authors analyzed the distribution of RIBEYE clusters in different membrane compartments and correctly conclude that RIBEYE clusters are not trapped in any of those compartments, but it is soluble instead. The authors, however, did not carry out a similar analysis for Palm-Bassoon. It is therefore unknown if Palm-Bassoon binds to other membrane compartments besides the plasma membrane. That could occur because in non-neuronal cells GAP43 has been described to be in internal membrane compartments. This should be investigated to document the existence of ectopic internal Synribbons beyond the plasma membrane because it might have implications for interpreting functional data in case Ca2+-channels become part of those internal Synribbons.

      In response to this valid concern, we have now included the suggested experiment in Supplementary Figure 1. We investigated the subcellular localisation of Palm-Bassoon and did not find Palm-Bassoon puncta to colocalise with ER, Golgi, or lysosomal markers, suggesting against a possible binding with membrane compartments inside the cell. We have added the following sentence in the results section, line 145 : “Palm-Bassoon does not appear to localize in the ER, Golgi apparatus or lysosomes (Supplementary Fig 1 D, E and F).”

      (3) The co-expression of RBP2 and Palm-Bassoon induces a rather minor but significant increase in Ca2+-currents (Figure 5). Such an increase does not occur upon expression of (1) Palm-Bassoon alone, (2) RBP2 alone or (3) RIBEYE alone (Figure 5). Intriguingly, the concomitant expression of PalmBassoon, RBP2 and RIBEYE does not translate into an increase of Ca2+-currents either (Figure 7).

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. We also highlight that, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. 

      (4) The authors claim that Ca2+-imaging reveals increased CA2+-signal intensity at synthetic ribbontype AZs. That claim is a subject of concern because the increase is rather small and it does not correlate with an increase in Ca2+-currents.

      Thanks for the comment: please see our response to your first comment and the lines 585 – 610 in the discussion section.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors should have a better discussion of problems derived from over-expression.

      Done. Please see above. 

      (2) Ideally, the authors would repeat the study using a secretory cell line, but this is of course not possible. The idea could be brought forth, though.

      As described above in our response to the public review of reviewer 2, we have discussed this idea in the discussion section (refer to lines 615 – 619), emphasizing on both the advantages and the limitations of using a secretory cell line (e.g. PC12 cells) instead of HEK293 cells as a model for performing such reconstitutions. 

      Reviewer #3 (Recommendations For The Authors):

      (1) There are several figures in which colocalization between different proteins is studied only displaying images but without any quantitative data. This should be corrected by providing such a quantitative analysis.

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      (2) The little increase in Ca2+-currents and Ca2+-influx associated to the clustering of Ca2+-channels to Synribbons is a concern. The authors should discuss if such a minor increase (found only when Palm-Bassoon and RBP2 ae co-expressed) would have or not physiological consequences in an actual synapse. They might discuss the comparison of those results and compare with results obtained in genetically modified mice in which Ca2+-currents are affected upon the removal of AZs proteins. On the other hand, they should explain why Ca2+-currents do not increase when the Synribbons are formed by RIBEYE, Palm-Bassoon and RBP2.

      Done. Please see above. 

      (3) The description of the patch-clamp experiments should be enriched by including representative currents. Did the authors measure tail currents?

      We would like to thank the reviewer for the valuable suggestion and have now added representative currents to the figures (see Supplementary Figure 5B). We agree with the reviewer on the importance of further characterizing the Ca<sup>2+</sup> currents in the presence and absence of SyRibbons by analysis of tail currents for counting the number of Ca<sup>2+</sup> channels by non-stationary fluctuation analysis but consider this to be out of scope of the current study and an objective for future studies. 

      (4) The current displayed in Figure 7 E should be explained better.

      Previous studies have shown that Ca<sup>2+</sup>-binding proteins (CaBPs) compete with Calmodulin to reduce Ca<sup>2+</sup>-dependent inactivation (CDI) and promote sustained Ca<sup>2+</sup> influx in Inner Hair Cells (Cui et al, 2007; Picher et al, 2017). In the absence of CaBPs, CaV1.3-mediated Ca<sup>2+</sup> currents show more rapid CDI as in the case here upon heterologous expression in HEK cells ((Koschak et al, 2001), see also Picher et al 2017 where co-expression of CaBP2 with CaV1.3 inhibits CDI in HEK293 cells). The inactivation kinetics of CaV1.3 are also regulated by the subunit composition (Cui et al, 2007) along with the modulation via interaction partners and given the reconstitution here we do not find the currents very surprising. 

      (5) Is the difference in Ca2+-influx still significantly higher upon the removal of the maximum value measured in positive Syribbons spots (Figure 7, panel K)?

      Yes, on removing the maximum value, the P value increases from 0.01 to 0.03 but remains statistically significant. 

      (6) In summary, although the approach pioneered by the authors is exciting and provides relevant results, there is a major concern regarding the interpretation of the modulation of Ca2+ channels.

      We have now carefully rephrased our interpretation on the modulation of Ca<sup>2+</sup> channels.  

      References

      Brandt A (2005) Few CaV1.3 Channels Regulate the Exocytosis of a Synaptic Vesicle at the Hair Cell Ribbon Synapse. Journal of Neuroscience 25: 11577–11585

      Cui G, Meyer AC, Calin-Jageman I, Neef J, Haeseleer F, Moser T & Lee A (2007) Ca2+-binding proteins tune Ca2+-feedback to Cav1. 3 channels in mouse auditory hair cells. The Journal of Physiology 585: 791–803

      Davydova D, Marini C, King C, Klueva J, Bischof F, Romorini S, Montenegro-Venegas C, Heine M, Schneider R, Schröder MS, et al (2014) Bassoon specifically controls presynaptic P/Q-type Ca(2+) channels via RIM-binding protein. Neuron 82: 181–194

      tom Dieck S, Altrock WD, Kessels MM, Qualmann B, Regus H, Brauner D, Fejtová A, Bracko O, Gundelfinger ED & Brandstätter JH (2005) Molecular dissection of the photoreceptor ribbon synapse: physical interaction of Bassoon and RIBEYE is essential for the assembly of the ribbon complex. J Cell Biol 168: 825–836

      Frank T, Rutherford MA, Strenzke N, Neef A, Pangršič T, Khimich D, Fejtova A, Gundelfinger ED, Liberman MC, Harke B, et al (2010) Bassoon and the synaptic ribbon organize Ca2+ channels and vesicles to add release sites and promote refilling. Neuron 68: 724–738

      Grabner CP & Moser T (2021) The mammalian rod synaptic ribbon is essential for Cav channel facilitation and ultrafast synaptic vesicle fusion. eLife 10: e63844

      Hibino H, Pironkova R, Onwumere O, Vologodskaia M, Hudspeth AJ & Lesage F (2002) RIM - binding proteins (RBPs) couple Rab3 - interacting molecules (RIMs) to voltage - gated Ca2+ channels. Neuron 34: 411–423

      Inoue E, Deguchi-Tawarada M, Takao-Rikitsu E, Inoue M, Kitajima I, Ohtsuka T & Takai Y (2006) ELKS, a protein structurally related to the active zone protein CAST, is involved in Ca2+-dependent exocytosis from PC12 cells. Genes to Cells 11: 659–672

      Janigro D, Maccaferri G & Meldolesi J (1989) Calcium channels in undifferentiated PC12 rat pheochromocytoma cells. FEBS Letters 255: 398–400

      Jean P, Morena DL de la, Michanski S, Tobón LMJ, Chakrabarti R, Picher MM, Neef J, Jung S, Gültas M, Maxeiner S, et al (2018) The synaptic ribbon is critical for sound encoding at high rates and with temporal precision. Elife 7: e29275

      Koschak A, Reimer D, Huber I, Grabner M, Glossmann H, Engel J & Striessnig J (2001) alpha 1D (Cav1.3) subunits can form l-type Ca2+ channels activating at negative voltages. J Biol Chem 276: 22100–22106

      Krinner S, Butola T, Jung S, Wichmann C & Moser T (2017) RIM-Binding Protein 2 Promotes a Large Number of CaV1.3 Ca2+-Channels and Contributes to Fast Synaptic Vesicle Replenishment at Hair Cell Active Zones. Front Cell Neurosci 11: 334

      Liu H, Felix R, Gurnett CA, De Waard M, Witcher DR & Campbell KP (1996) Expression and Subunit Interaction of Voltage-Dependent Ca2+ Channels in PC12 Cells. J Neurosci 16: 7557–7565

      Lv C, Stewart WJ, Akanyeti O, Frederick C, Zhu J, Santos-Sacchi J, Sheets L, Liao JC & Zenisek D (2016) Synaptic Ribbons Require Ribeye for Electron Density, Proper Synaptic Localization, and Recruitment of Calcium Channels. Cell Reports 15: 2784–2795

      Matthews G & Fuchs P (2010) The diverse roles of ribbon synapses in sensory neurotransmission. Nat Rev Neurosci 11: 812–822

      Maxeiner S, Luo F, Tan A, Schmitz F & Südhof TC (2016) How to make a synaptic ribbon: RIBEYE deletion abolishes ribbons in retinal synapses and disrupts neurotransmitter release. The EMBO Journal 35: 1098–1114

      Michanski S, Kapoor R, Steyer AM, Möbius W, Früholz I, Ackermann F, Gültas M, Garner CC, Hamra FK, Neef J, et al (2023) Piccolino is required for ribbon architecture at cochlear inner hair cell synapses and for hearing. EMBO Rep 24: e56702

      Michanski S, Smaluch K, Steyer AM, Chakrabarti R, Setz C, Oestreicher D, Fischer C, Möbius W, Moser T, Vogl C, et al (2019) Mapping developmental maturation of inner hair cell ribbon synapses in the apical mouse cochlea. PNAS 116: 6415–6424

      Neef J, Urban NT, Ohn T-L, Frank T, Jean P, Hell SW, Willig KI & Moser T (2018) Quantitative optical nanophysiology of Ca2+ signaling at inner hair cell active zones. Nat Commun 9: 290

      Park D, Wu Y, Lee S-E, Kim G, Jeong S, Milovanovic D, Camilli PD & Chang S (2021) Cooperative function of synaptophysin and synapsin in the generation of synaptic vesicle-like clusters in non-neuronal cells. Nat Commun 12

      Picher MM, Gehrt A, Meese S, Ivanovic A, Predoehl F, Jung S, Schrauwen I, Dragonetti AG, Colombo R, Camp GV, et al (2017) Ca2+-binding protein 2 inhibits Ca2+-channel inactivation in mouse inner hair cells. PNAS 114: E1717–E1726

      Robertis ED & Franchi CM (1956) Electron Microscope Observations on Synaptic Vesicles in Synapses of the Retinal Rods and Cones. J Biophys Biochem Cytol 2: 307–318

      Roberts WM, Jacobs RA & Hudspeth AJ (1990) Colocalization of ion channels involved in frequency selectivity and synaptic transmission at presynaptic active zones of hair cells. J Neurosci 10: 3664–3684

      Smith CA & Sjöstrand FS (1961) A synaptic structure in the hair cells of the guinea pig cochlea. Journal of Ultrastructure Research 5: 184–192

      Wong AB, Rutherford MA, Gabrielaitis M, Pangršič T, Göttfert F, Frank T, Michanski S, Hell S, Wolf F, Wichmann C, et al (2014) Developmental refinement of hair cell synapses tightens the coupling of Ca2+ influx to exocytosis. EMBO J 33: 247–264

      Zampini V, Johnson SL, Franz C, Lawrence ND, Münkner S, Engel J, Knipper M, Magistretti J, Masetto S & Marcotti W (2010) Elementary properties of CaV1.3 Ca(2+) channels expressed in mouse cochlear inner hair cells. J Physiol 588: 187–199

    1. eLife Assessment

      The reported cryo-EM imaging of a pentameric ligand-gated ion channel in liposomes as opposed to nanodiscs has both broad implications and contributes valuable methodological advances to the structural investigation of membrane receptors. The comparison of structures assigned to distinct functional states in liposomes versus nanodiscs is convincing and will aid membrane protein structural biologists in selection of functionally relevant membrane reconstitution environments.

    2. Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channels interactions with MSPs might alter its structure, possibly influencing the functional states of the channel. Thus, the authors describe reconstitution in liposomes are more representative of the native structure and can recapitulate all channel states.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane memetics that could alter channel structure. The authors succeeded in comparing structures determined in liposomes to those in a wide range of nanodisc diameters. This comparison gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      As the overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, limited functional data was provided when determining the functional states of the channel with most of the evidence deriving from structures, which only provides snapshots of channels.

    3. Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitised and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.<br /> The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weakness

      All the details necessary to reproduce the work are present in the Methods. Nevertheless, the biochemistry might have been shown and discussed in greater details. While I do believe that liposomes will be in most cases better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported that their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channel's interactions with MSPs might alter its structure, possibly not accurately representing/reflecting functional states of the channel.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is a promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane mimetics that could alter channel structure. Comparing liposomal ELIC to structures in various-sized nanodiscs gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      The overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. Including comparisons of the results to the native bacterial lipid environment would provide a more encompassing discussion of how the determined liposome structures might or might not relate to the native receptor in its native environment. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, no functional studies were performed to validate this statement.

      The goal of this study was to determine structures of ELIC in the same lipid environment in which its function is characterized. However, it is also worth noting that phosphatidylethanolamine and phosphatidylglyerol, two lipids used for the liposome formation, are necessary for ELIC function (PMID 36385237) and principal lipid components of gram-negative bacterial membranes in which ELIC is expressed.

      The desensitized structure of ELIC in liposomes shows a pore diameter at the hydrophobic L240 (9’) residue of 3.3 Å, which is anticipated to pose a large energetic barrier to the passage of ions due to the hydrophobic effect. We have included a graphical representation of pore diameters from the HOLE analysis for all liposome structures in Supplementary Figure 6B. While we have not tested the role of L240 in desensitization with functional experiments, it was shown by Gonzalez-Gutierrez and colleagues (PMID 22474383) that the L240A mutation apparently eliminates desensitization in ELIC. This finding is consistent with L240 (9’) being the desensitization gate of ELIC. We have referenced this study when discussing the desensitization gate in the Results.

      Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitized, and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether, these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.

      The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weaknesses

      Core parts of the method are not described and/or discussed in enough detail. While I do believe that liposomes will be, in most cases, better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure, and should be an integral part of the report. Therefore, I strongly felt that biochemistry should be better described and discussed. The results section starts with "Optimal reconstitution of ELIC in liposomes [...] was achieved by dialysis". There is no information on why dialysis is optimal, what it was compared to, the distribution of liposome sizes using different preparation techniques, etc... Reading the title, I would have expected a couple of paragraphs and figure panels on liposome reconstitution. Similarly, potential biochemical challenges are not discussed. The methods section mentions that the sample was "dialyzed [...] over 5-7 days". In such a time window, most of the members of this protein family would aggregate, and it is therefore a protocol that can not be directly generalised. This has to be mentioned explicitly, and a discussion on why this can't be done in two days, what else the authors tested (biobeads? ... ?) would strengthen the manuscript.

      To a lesser extent, the relative lack of both technical details and of a broad discussion also pertains to the cryoEM and thallium flux results. Regarding the cryoEM part, the authors focus their analysis on reconstructions from outward-facing particles on the basis of their better resolutions, yet there was little discussion about it. Is it common for liposome-based structures? Are inward-facing reconstructions worse because of the increased background due to electrons going through two membranes? Are there often impurities inside the liposomes (we see some in the figures)? The influence of the membrane mimetics on conformation could be discussed by referring to other families of proteins where it has been explored (for instance, ABC transporters, but I'm sure there are many other examples). If there are studies in other families of channels in liposomes that were inspirational, those could be mentioned. Regarding thallium flux assays, one argument is that they give access to kinetics and set the stage for time-resolved cryoEM, but if I did not miss it, no comparison of kinetics with other techniques, such as electrophysiology, nor references to eventual pioneer time-resolved studies are provided.

      Altogether, in my view, an updated version would benefit from insisting on every aspect of the methodological development. I may well be wrong, but I see this paper more like a milestone on sample prep for cryoEM imaging than being about the details of the ELIC conformations.

      Additions have been made to the Results and Discussion sections elaborating on the following points: 1) reconstitution of ELIC in liposomes using dialysis, the advantage of this over other methods such as biobeads, and whether the dialysis protocol can be shortened for other less stable proteins; 2) the issue of separating outward- and inward-facing channels; 3) referencing the effect of nanodiscs on ABC transporters, structures of membrane proteins in liposomes, and pioneering time-resolved cryo-EM studies; and 4) comparison of the kinetics of ELIC gating kinetics with electrophysiology measurements. With regards to the first point, it should be noted that all necessary details are provided in the Methods to reproduce the experiments including the reconstitution and stopped-flow thallium flux assay. It is also important to note that the same preparation for making proteoliposomes was used for assessing function using the stopped-flow thallium flux assay and for determining the structure by cryo-EM. This is now stated in the Results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major revisions:

      (1) The authors suggest that the desensitization gate is located at the 9' region within the pore. However, as stated by the authors, the 2' residues function as the desensitization gate in related channels. In a few of their HOLE analyzed structures (e.g. Figure 2B and 4B), there seems to be a constriction also at 2', but this finding is not discussed in the context of desensitization. Further functional testing of mutated 9' and/or 2' gates would bolster the argument for the location of the desensitization gate.

      As stated above, we have included HOLE plots of pore radius in Supplementary Fig. 6B and referenced the study showing that the L240A mutation (9’) in ELIC (PMID 22474383) appears to eliminate desensitization. This result along with the narrow pore diameter at 9’ in the desensitized structure suggests that 9’ is likely a desensitization gate in ELIC. In contrast, mutation of Q233 (2’) to a cysteine in a previous study produced a channel that still desensitizes (PMID 25960405). Since Q233 is a hydrophilic residue in contrast to L240, Q233 probably does not pose the same energetic barrier to ion translocation as L240 based on the structure.

      (2) In discussing functional states of ELIC and ELIC5 in different reconstitution methods, the authors reference constriction sites determined by HOLE analysis software. These constriction sites were key evidence for the authors to determine functional state, however, it is difficult to discern pore sizes based on the figures. Pore diameters and clear color designation (ie, green vs orange) with the figures would greatly aid their discussions.

      HOLE plots are displayed in Supplementary Fig. 6B and pore diameters are not provided in the text.

      (3) The authors had an intriguing finding that ELIC dimers are found in spNW25 scaffolds. Is there any functional evidence to suggest they could be functioning as dimers?

      There is no evidence that the function of ELIC or other pLGICs is altered by the formation of dimers of pentamers. Therefore, while this result is intriguing and likely facilitated by concentrating multiple ELIC pentamers within the nanodisc, it is not clear if these interactions have any functional importance. We have stated this in the Results.

      (4) Thallium flux assay to validate channel function within proteoliposomes. Proteoliposomes are known to be generally very leaky membranes, would be good to have controls without ELIC added to determine baseline changes in fluorescence.

      We have established from multiple previous studies that liposomes composed of 2:1:1 POPC:POPE:POPG (PMID 36385237 and 31724949) do not show significant thallium flux as measured by the stopped-flow assay (PMID 29058195) in the absence of ELIC activity. Furthermore, in the present study, the data in Fig. 1A of WT ELIC shows a low thallium flux rate 60 seconds after exposure to agonist when the ion channel has mostly desensitized. Therefore, this data serves also as a control indicating that the high thallium flux rates in response to agonist (at earlier delay times) are not due to leak, but rather due to ELIC channel activity.

      Minor revisions:

      (1) Abstract and introduction. 'Liganded' should be ligand

      We removed this word and changed it to “agonist-bound” for consistency throughout the manuscript.

      (2) Inconsistent formatting of FSC graphs in Supplemental Figure 4

      The difference is a consequence of the different formatting between cryoSPARC and Relion FSC graphs.

      Reviewer #2 (Recommendations for the authors):

      Minor writing remarks:

      The present report builds on previous work from the same team, and to my eye it would be a plus if this were conveyed more explicitly. I see it as a strength to explore various developments in several papers that complement each other. E.g in the introduction when citing reference 12 (Dalal 2024), later in introducing ref 15 (Petroff 2022), I wish I was reminded of the main findings and how they fit with the new results.

      We have expanded on the Results and Discussion detailing key findings from these studies that are relevant to the current study.

      Suggestions for analysis:

      Data treatment. Maybe I missed it, but I wondered if C1 vs C5 treatment of the liposome data showed any interesting differences? When I think about the biological membrane, I picture it as a very crowded place with lots of neighbouring proteins. I would not be surprised if, similarly to what they do in discs, the receptor would tend to stick to, or bump into, anything present also in liposomes (a neighboring liposome, some undefined density inside the liposome).

      We attempted to perform C1 heterogeneous refinement jobs in cryoSPARC and C1 3D classification in Relion5. For the WT datasets, these did not produce 3D reconstructions that were of sufficient quality for further refinement. For ELIC5 with agonist, the C1 reconstructions were not different than the C5 reconstructions. Furthermore, there was no evidence of dimers of pentamers from the 2D or 3D treatments, unlike what was observed in the spNW25 nanodiscs. This is likely because the density of ELIC pentamers in the liposomes was too low to capture these transient interactions. We have included this information in the Methods.

      In data treatment, we sometimes find only what we're looking for. I wondered if the authors tried to find, for instance, the open and D conformations in the resting dataset during classifications.

      This is an interesting question since some population of ELIC channels could visit a desensitized conformation in the absence of agonist and this would not be detected in our flux assay. After extensive heterogeneous refinement jobs in cryoSPARC and 3D classification jobs in Relion5, we did not detect any unexpected structures such as open/desensitized conformations in the apo dataset.

      In the analysis of the M4 motions, is there info to be gained by looking at how it interacts with the rest of the TMD? For instance, I wondered if the buried surface area between M4 and the rest was changed. Also one could imagine to look at that M4 separately in outward-facing and inward-facing conformations (because the tension due to the bilayer will not be the same in the outer layer in both orientations - intuitively, I'd expect different levels of M4 motions)

      We have expanded our analysis of the structures as recommended. We determined the buried surface area between M4 and the rest of the channel in the liganded WT and ELIC5 structures in liposomes and nanodiscs, as well as the area between the TMD interfaces for these structures. There appears to be a pattern where liposome structures show less buried surface area between M4 and the rest of the channel, and less area at the TMD interfaces. Overall, this suggests that the liposome structures of ELIC in the open-channel or desensitized conformations are more loosely packed in the TMD compared to the nanodisc structures.

      We have also further discussed the issue of separating outward- and inward-facing conformations in the Results. The problem with classifying outward- and inward-facing orientations is that top/down or tilted views of the particles cannot be easily distinguished as coming from channels in one orientation or the other, unless there are conformational differences between outward- and inward-facing channels that would allow for their separation during 3D heterogeneous refinement or 3D classification. Furthermore, since the inward-facing reconstructions are of much lower resolution than the outward-facing reconstructions, we suspect that these particles are more heterogeneous possibly containing junk, multiple conformations, or particles that are both inward- and outward-facing. On the other hand, the outward-facing structures are of good quality, and therefore we are more confident that these come from a more homogeneous set of particles that are likely outward-facing (Note that most particles are outward facing based on side views of the 2D class averages). That said, when examining the conformation of M4 in outward- and inward-facing structures, we do not see any significant differences with the caveat that the inward-facing structures are of poor quality and that inward- and outward-facing particles may not have been well-separated.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathology. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific. Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology.

      Starting with claim 1, there are three subclaims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not strongly supported.

      (1B) The experiment cannot support the claim that people represent or track elasticity because effort is the only dimension over which participants can engage in any meaningful decision-making. The other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies. Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort.

      Notes on rebuttal: The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      Notes on rebuttal: The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct (the authors claim otherwise, but see Fig 6C). However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency (SOA) and the elasticity bias---this result is consistent with any possible relationship (even a negative one). As it turns out, Figure S3 shows that there is effectively no relationship (r=0.03).

      Notes on rebuttal: The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences about elasticity inference. In the original submission, the authors stated that the study was designed to be "especially sensitive to overestimation of elasticity". A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias.

      When we further consider that elasticity inference is the only meaningful learning/decision-making problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      Notes on rebuttal: I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity-eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      Minor comments:

      Below are things to keep in mind.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    1. eLife Assessment

      This study establishes bathy phytochromes, a unique class of bacterial photoreceptors that respond to near-infrared light (NIR), as important tools for bacterial optogenetics. NIR light is a key control signal in optogenetics due to its deep tissue penetration and the ability to combine with existing red- and blue-light sensitive systems, but thus far, NIR-activated proteins have been poorly characterized. The strength of the evidence is solid overall, with comprehensive in vitro characterization, modular design strategies, and validation across different hosts. There are some questions that remain such as the rationale for linker choices, characterization of growth and performance relative to controls, and the physiological significance of color blind effects at alkaline pH but overall, this study should advance the fields of optogenetics and photobiology and inspire future work.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      (1) The experiments are well-founded, well-executed, and rigorous.

      (2) The manuscript is clearly written.

      (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      (4) This study is a valuable contribution to photobiology and optogenetics.

      Weaknesses:

      (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).

      (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Weaknesses:

      (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.

      (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.

      (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.

      (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathy-BphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.

      Weaknesses:

      My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.

    1. eLife Assessment

      This work models reinforcement-learning experiments using a recurrent neural network. It examines if the detailed credit assignment necessary for back-propagation through time can be replaced with random feedback. In this important study the authors show that it yields a satisfactory approximation and the evidence to support that it holds within relatively simple tasks is solid.

    2. Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are consistent with previous results on random feedback.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.<br /> • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.<br /> • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. They assume that each time step is on the order of hundreds of ms. They justify this by pointing to some slow intrinsic mechanisms, but they do not implement these slow mechanisms is a network with short time steps, instead they assume without demonstration that these could work as suggested. This is a reasonable first approximation, but its validity should be explicitly tested.

      • As the delay between cue and reward increases the performance decreases. This is not surprising given the proposed mechanism, but is still a limitation, especially given that we do not really know what a is the reasonable value of a single time step.

    3. Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropogation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to larger networks or more complicated tasks with long temporal delays (>100 timesteps), so it remains unclear to what degree these methods can scale or can be used more generally.

      Comments on revisions: I would still want to see how well the network learns tasks with longer time delays (on the order of 100 or even 1000 timesteps). Previous work has shown that random feedback struggles to encode longer timescales (see Murray 2019, Figure 2), so I would be interested to see how that translates to the RL context in your model.

    4. Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant, since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:<br /> 7a) For instance, the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.<br /> 7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.<br /> 7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      [1] https://www.nature.com/articles/s41467-020-17236-y

      Comments on revisions:

      Thank you for addressing all my comments in your reply.

    5. Author response:

      The following is the authors’ response to the original reviews

      Summary of our revisions

      (1) We have explained the reason why the untrained RNN with readout (value-weight) learning only could not well learn the simple task: it is because we trained the models continuously across trials with random inter-trial intervals rather than separately for each episodic trial and so it was not trivial for the models to recognize that cue presentation in different trials constitutes a same single state since the activities of untrained RNN upon cue presentation should differ from trial to trial (Line 177-185).

      (2) We have shown that dimensionality was higher in the value-RNNs than in the untrained RNN (Fig. 2K,6H).

      (3) We have shown that even when distractor cue was introduced, the value-RNNs could learn the task (Fig. 10).

      (4) We have shown that extended value-RNNs incorporating excitatory and inhibitory units and conforming to the Dale's law could still learn the tasks (Fig. 9,10-right column).

      (5) In the original manuscript, the non-negatively constrained value-RNN showed loose alignment of value-weight and random feedback from the beginning but did not show further alignment over trials. We have clarified its reason and found a way, introducing a slight decay (forgetting), to make further alignment occur (Fig. 8E,F).

      (6) We have shown that the value-RNNs could learn the tasks with longer cue-reward delay (Fig. 2M,6J) or action selection (Fig. 11), and found cases where random feedback performed worse than symmetric feedback.

      (7) We compared our value-RNNs with e-prop (Bellec et al., 2020, Nat Commun). While e-prop incorporates the effects of changes in RNN weights across distant times through "eligibility trace", our value-RNNs do not. The reason why our models can still learn the tasks with cue-reward delay is considered to be because our models use TD error and TD learning itself, even TD(0) without eligibility trace, is a solution for temporal credit assignment. In fact, TD error-based e-prop was also examined, but for that, result with symmetric feedback, but not with random feedback, was shown (their Fig. 4,5) while for another setup of reward-based e-prop without TD error, result with random feedback was shown (their SuppFig. 5). We have noted these in Line 695-711 (and also partly in Line 96-99).

      (8) In the original manuscript, we emphasized only the spatial locality (random rather than symmetric feedback) of our learning rule. But we have now also emphasized the temporal locality (online learning) as it is also crucial for bio-plausibility and critically different from the original value-RNN with BPTT. We also changed the title.

      (9) We have realized that our estimation of true state values was invalid (as detailed in page 34 of this document). Effects of this error on performance comparisons were small, but we apologize for this error.

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      *please note that we numbered your public review comments and recommendations for the authors as Pub1 and Rec1 etc so that we can refer to them in our replies to other comments.

      Pub1. The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained.

      These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      We have added an explanation of untrained RNN in Line 144-147:

      “As a negative control, we also conducted simulations in which these connections were not updated from initial values, referring to as the case with "untrained (fixed) RNN". Notably, the value weights w (i.e., connection weights from the RNN to the striatal value unit) were still trained in the models with untrained RNN.”

      We have also analyzed the dimensionality of network dynamic by calculating the contribution ratios of each principal component of the trajectory of RNN activities. It was revealed that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN. We have added these results in Fig. 2K and Line 210-220 (for our original models without non-negative constraint):

      “In order to examine the dimensionality of RNN dynamics, we conducted principal component analysis (PCA) of the time series (for 1000 trials) of RNN activities and calculated the contribution ratios of PCs in the cases of oVRNNbp, oVRNNrf, and untrained RNN with 20 RNN units. Figure 2K shows a log of contribution ratios of 20 PCs in each case. Compared with the case of untrained RNN, in oVRNNbp and oVRNNrf, initial component(s) had smaller contributions (PC1 (t-test p = 0.00018 in oVRNNbp; p = 0.0058 in oVRNNrf) and PC2 (p = 0.080 in oVRNNbp; p = 0.0026 in oVRNNrf)) while later components had larger contributions (PC3~10,15~20 p < 0.041 in oVRNNbp; PC5~20 p < 0.0017 in oVRNNrf) on average, and this is considered to underlie their superior learning performance. We noticed that late components had larger contributions in oVRNNrf than in oVRNNbp, although these two models with 20 RNN units were comparable in terms of cue~reward state values (Fig. 2J-left).”

      and Fig. 6H and Line 412-416 (for our extended models with non-negative constraint):

      “Figure 6H shows contribution ratios of PCs of the time series of RNN activities in each model with 20 RNN units. Compared with the cases with naive/shuffled untrained RNN, in oVRNNbp-rev and oVRNNrf-bio, later components had relatively high contributions (PC5~20 p < 1.4×10,sup>−6</sup> (t-test vs naive) or < 0.014 (vs shuffled) in oVRNNbp-rev; PC6~20 p < 2.0×10<sup>−7</sup> (vs naive) or PC7~20 p < 5.9×10<sup>−14</sup> (vs shuffled) in oVRNNrf-bio), explaining their superior value-learning performance.”

      Regarding the poor performance of the model with untrained RNN, we would like to add a note. It is sure that untrained RNN with sufficient dimensions should be able to well represent just <10 different states, and state values should be able to be well learned through TD learning regardless of whatever representation is used. However, a difficulty (nontriviality) lies in that because we modeled the tasks in a continuous way, rather than in an episodic way, the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using backprop-through-time (BPTT) for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      Pub2. The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      In the revised manuscript, we examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps. Our online value RNN models with random feedback could still achieve better performance (smaller squared value error) than the models with untrained RNN, although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      Also, we have added the note about our assumption and consideration on the time-step that we described in our provisional reply in Line 136-142:

      “We assumed that a single RNN unit corresponds to a small population of neurons that intrinsically share inputs and outputs, for genetic or developmental reasons, and the activity of each unit represents the (relative) firing rate of the population. Cortical population activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics [46] such as short-term facilitation, whose time constant can be around 500 milliseconds [47]. Therefore, we assumed that single time-step of our rate-based (rather than spike-based) model corresponds to 500 milliseconds.”

      Pub3. In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      We examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units” and described the details of the extended models in Line 844-862:

      Pub4. Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      We examined the performance of the models in a task in which distractor cue randomly appeared. As a result, our model with random feedback, as well as the model with backprop, could still learn the state values much better than the models with untrained RNN. We have added these results in Fig. 10 and subsection “4.2 Task with distractor cue”

      Reviewer #1 (Recommendations for the authors):

      Detailed comments to authors

      Rec1. Are the untrained RNNs discussed in methods? It seems quite good in estimating value but has a strong dopamine response at time of reward. Is nothing trained in the untrained RNN or are the W values trained. Untrained RNN are not bad at estimating value, but not as good as the two other options. It would seem reasonable that an untrained RNN (if I understand what it is) will be sufficient for such simple Pavlovian conditioning paradigms. This is provided that the RNN generates a complete, or nearly complete basis. Random RNN's provided that the random weights are chosen properly can indeed generate a nearly complete basis. Once there is a nearly complete temporal basis, it seems that a powerful enough learning rule will be able to learn the very simple Pavlovian conditioning. Since there are only 3 time-steps from cue to reward, an RNN dimensionality of 3 would be sufficient. A failure to get a good approximation can also arise from the failure of the learning algorithm for the output weights (W).

      As we mentioned in our reply to your public comment Pub1 (page 3-5), we have added an explanation of "untrained RNN" (in which the value weights were still learnt) (Line 144-147). We also analyzed the dimensionality of network dynamics by calculating the contribution ratios of principal components of the trajectory of RNN activities, showing that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN (Fig. 2K/Line 210-220, Fig.6H/Line 412-416). Moreover, also as we mentioned in our reply to your public comment Pub1, we have added a note that even learning of a small number of states was not trivially easy because we considered continuous learning across trials rather than episodic learning of separate trials and thus it was not trivial for the model to know that cue presentation in different trials after random lengths of inter-trial interval should still be regarded as a same single state (Line 177-185).

      Rec2. For all cases, it will be useful to estimate the dimensionality of the RNN. Is the dimensionality of the untrained RNN smaller than in the trained cases? If this is the case, this might depend on the choice of the initial random (I assume) recurrent connectivity matrix.

      As mentioned above, we have analyzed the dimensionality of the network dynamics, and as you said, the dimensionality of the model with untrained RNN (which was indeed the initial random matrix as you said, as we mentioned above) was on average smaller than the trained value RNN models (Fig. 2K/Line 210-220, Fig.6H/Line 412-416).

      Rec3. It is surprising that the error starts increasing for more RNN units above ~15. See discussion. This might indicate a failure to adjust the learning parameters of the network rather than a true and interesting finding.

      Thank you very much for this insightful comment. In the original manuscript, we set the learning rate to a fixed value (0.1), without normalization by the squared norm of feature vector (as we mentioned in Line 656-7 of the original manuscript) because we thought such a normalization could not be locally (biologically) implemented. However, we have realized that the lack of normalization resulted in excessively large learning rate when the number of RNN units was large and it could cause instability and error increase as you suggested. Therefore, in the revised manuscript, we have implemented a normalization of learning rate (of value weights) that does not require non-local computations, specifically, division by the number of RNN units. As a result, the error now monotonically decreased, as the number of RNN units increased, in the non-negatively constrained models (Fig. 6E-left) and also largely in the unconstrained model with random feedback, although still not in the unconstrained model with backprop or untrained RNN (Fig. 2J-left)

      Rec4. Not numbering equations is a problem. For example, the explanations of feedback alignment (lines 194-206) rely on equations in the methods section which are not numbered. This makes it hard to read these explanations. Indeed, it will also be better to include a detailed derivation of the explanation in these lines in a mathematical appendix. Key equations should be numbered.

      We have added numbers to key equations in the Methods, and references to the numbers of corresponding equations in the main text. Detailed derivations are included in the Methods.

      Rec5. What is shown in Figure 3C? - an equation will help.

      We have added an explanation using equations in the main text (Line 256-259).

      Rec6. The explanation of why alignment occurs is not satisfactory, but neither is it in previous work on feedforward networks. The least that should be done though

      Regarding why alignment occurs, what remained mysterious (to us) was that in the case of nonnegatively constrained model, while the angle between value weight vector (w) and the random feedback vector (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials, despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added these in the revised manuscript (Line 463-477):

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Rec7. I don't understand the qualitative difference between 4G and 4H. The difference seems to be smaller but there is still an apparent difference. Can this be quantified?

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      Rec8. More biologically realistic constraints.

      Are the weights allowed to become negative? - No.

      Figure 6C - untrained RNN with non-negative x_i. Again - it was not explained what untrained RNN is. However, given my previous assumption, this is probably because the units developed in an untrained RNN is much further from representing a complete basis function. This cannot be done with only positive values. It would be useful to see network dynamics of units for untrained RNN. It might also be useful in all cases to estimate the dimensionality of the RNN. For 3 time-steps, it needs to be at least 3, and for more time steps as in Figure 4, larger.

      As we mentioned in our reply to your public comment Pub3 (page 6-8), in the revised manuscript we examined models that incorporated inhibitory and excitatory units and followed Dale's law, which could still learn the tasks (Fig. 9, Line 479-520). We have also analyzed the dimensionality of network dynamics as we mentioned in our replies to your public comment Pub1 and recommendations Rec1 and Rec2.

      Rec9. A new type of untrained RNN is introduced (Fig 6D) this is the first time an explanation of of the untrained RNN is given. Indeed, the dimensionality of the second type of untrained RNN should be similar to the bioVRNNrf. The results are still not good.

      In the model with the new type of untrained RNN whose elements were shuffled from trained bioVRNNrf, contribution ratios of later principal components of the trajectory of RNN activities (Fig. 6H gray dotted line) were indeed larger than those in the model with native untrained RNN (gray solid line) but still much smaller than those in the trained value RNN models with backprop (red line) or random feedback (blue line). It is considered that in value RNN, RNN connections were trained to realize high-dimensional trajectory, and shuffling did not generally preserve such an ability.

      Rec10. The discussion is too long and verbose. This is not a review paper.

      We have made the original discussion much more compact (from 1686 words to 940 words). We have added new discussion, in response to the review comments, but the total length remains to be shorter than before (1589 words).

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain nonnegative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      We have examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units”.

      We have also examined the performance of the models in a task in which distractor cue randomly appeared, finding that our models could still learn the state values much better than the models with untrained RNN. We have added these result in Fig. 10 and subsection “4.2 Task with distractor cue”.

      Regarding the depth, we continue to think about it but have not yet come up with concrete ideas.

      Reviewer #2 (Recommendations for the authors):

      (1) I think the work would greatly benefit from more proofreading. There are language errors/oddities throughout the paper, I will list just a few examples from the introduction:

      Thank you for pointing this out. We have made revisions throughout the paper.

      line 63: "simultaneously learnt in the downstream of RNN". Simultaneously learnt in networks downstream of the RNN? Simulatenously learn in a downstream RNN? The meaning is not clear in the original sentence.

      We have revised it to "simultaneously learnt in connections downstream of the RNN" (Line 67-68).

      starting in line 65: " A major problem, among others.... value-encoding unit" is a run-on sentence and would more readable if split into multiple sentences.

      We have extensively revised this part, which now consists of short sentences (Line 70-75).

      line 77: "in supervised learning of feed-forward network" should be either "in supervised learning of a feed-forward network" or "in supervised learning of feed-forward networks".

      We have changed "feed-forward network" to "feed-forward networks" (Line 83).

      (2) Under what conditions can you use an online learning rule which only considers the influence of the previous timestep? It's not clear to me how your networks solve the temporal credit assignment problem when the cue-reward delay in your tasks is 3-5ish time steps. How far can you stretch this delay before your networks stop learning correctly because of this one-step assumption? Further, how much does feedback alignment constrain your ability to learn long timescales, such as in Murray, J.M. (2019)?

      The reason why our models can solve the temporal credit assignment problem at least to a certain extent is considered to be because temporal-difference (TD) learning, which we adopted, itself has a power to resolve temporal credit assignment, as exemplified in that TD(0) algorithms without eligibility trance can still learn the value of distant rewards. We have added a discussion on this in Line 702-705:

      “…our models do not have "eligibility trace" (nor memorable/gated unit, different from the original value-RNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]).”

      We have also examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps, and our models with random feedback could still achieve better performance than the models with untrained RNN although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      As for the difficulty due to random feedback compared to backprop, there appeared to be little difference in the models without non-negative constraint (Fig. 2M), whereas in the models with nonnegative constraint, when the cue-reward delay was elongated to 6 time-steps, the model with random feedback performed worse than the model with backprop (Fig. 6J bottom-left panel).

      (3) Line 150: Were the RNN methods trained with continuation between trials?

      Yes, we have added

      “The oVRNN models, and the model with untrained RNN, were continuously trained across trials in each task, because we considered that it was ecologically more plausible than episodic training of separate trials.” in Line 147-150. This is considered to make learning of even the simple cue-reward association task nontrivial, as we describe in our reply to your comment 9 below.

      (4) Figure 2I, J: indicate the statistical significance of the difference between the three methods for each of these measures.

      We have added statistical information for Fig. 2J (Line 198-203):

      “As shown in the left panel of Fig. 2J, on average across simulations, oVRNNbp and oVRNNrf exhibited largely comparable performance and always outperformed the untrained RNN (p < 0.00022 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units), although oVRNNbp somewhat outperformed or underperformed oVRNNrf when the number of RNN units was small (≤10 (p < 0.049)) or large (≥25 (p < 0.045)), respectively.”

      and also Fig. 6E (for non-negative models) (Line 385-390):

      “As shown in the left panel of Fig. 6E, oVRNNbp-rev and oVRNNrf-bio exhibited largely comparable performance and always outperformed the models with untrained RNN (p < 2.5×10<sup>−12</sup> in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units), although oVRNNbp-rev somewhat outperformed or underperformed oVRNNrf-bio when the number of RNN units was small (≤10 (p < 0.00029)) or large (≥25 (p < 3.7×10<sup>−6</sup>)), respectively…”

      Fig. 2I shows distributions, whose means are plotted in Fig. 2J, and we did not add statistics to Fig. 2I itself.

      (5) Line 178: Has learning reached a steady state after 1000 trials for each of these networks? Can you show a plot of error vs. trial number?

      We have added a plot of error vs trial number for original models (Fig. 2L, Line 221-223):

      “We examined how learning proceeded across trials in the models with 20 RNN units. As shown in Fig. 2L, learning became largely converged by 1000-th trial, although slight improvement continued afterward.”

      and non-negatively constrained models (Fig. 6I, Line 417-422):

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      As shown in these figures, learning became largely steady at 1000 trials, but still slightly continued, and we have added simulations with 3000 trials (Fig. 2M and Fig. 6J).

      (6) Line 191: Put these regression values in the figure caption, as well as on the plot in Figure 3B.

      We have added the regression values in Fig. 3B and its caption.

      (7) Line 199: This idea of being in the same quadrant is interesting, but I think the term "relatively close angle" is too vague. Is there another more quantatative way to describe this what you mean by this?

      We have revised this (Line 252-254) to “a vector that is in a relatively close angle with c , or more specifically, is in the same quadrant as (and thus within at maximum 90° from) c (for example, [c<sub>1</sub>  c<sub>2</sub>  c<sub>3</sub>]<sup>T</sup> and [0.5c<sub>1</sub> 1.2c<sub>2</sub> 0.8c<sub>3</sub>]T) “

      (8) Line 275: I'd like to see this measure directly in a plot, along with the statistical significance.

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      (9) Line 280: Surely the untrained RNN should be able to solve the task if the reservoir is big enough, no? Maybe much bigger than 50 units, but still.

      We think this is not sure. A difficulty lies in that because we modeled the tasks in a continuous way rather than in an episodic way (as we mentioned in our reply to your comment 3), the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using BPTT for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      (10) It's a bit confusing to compare Figure 4C to Figure 4D-H because there are also many features of D-H which do not match those of C (response to cue, response to late reward in task 1). It would make sense to address this in some way. Is there another way to calculate the true values of the states (e.g., maybe you only start from the time of the cue) which better approximates what the networks are doing?

      As we mentioned in our replies to your comments 3 and 9, our models with RNN were trained continuously across trials rather than separately for each episodic trial, and whether the models could still learn the state representation is a key issue. Therefore, starting learning from the time of cue would not be an appropriate way to compare the models, and instead we have made statistical comparison regarding key features, specifically, TD-RPEs at early and late rewards, as indicated in Fig. 4D-H.

      (11) Line 309: Can you explain why this non-monotic feature exists? Why do you believe it would be more biologically plausible to assume monotonic dependence? It doesn't seem so straightforward to me, I can imagine that competing LTP/LTD mechanisms may produce plasticity which would have a non-monotic dependence on post-synaptic activity.

      Thank you for this insightful comment. As you suggested, non-monotonic dependence on the postsynaptic activity (BCM rule) has been proposed for unsupervised learning (cortical self-organization) (Bienenstock et al., 1982 J Neurosci), and there were suggestions that triplet-based STDP could be reduced to a BCM-like rule and additional components (Gjorgjieva et al., 2011 PNAS; Shouval, 2011 PNAS). However, the non-monotonicity appeared in our model, derived from the backprop rule, is maximized at the middle and thus opposite from the BCM rule, which is minimized at the middle (i.e., initially decrease and thereafter increase). Therefore we consider that such an increase-then-decreasetype non-monotonicity would be less plausible than a monotonic increase, which could approximate an extreme case (with a minimum dip) of the BCM rule. We have added a note on this point in Line 355-358:

      “…the dependence on the post-synaptic activity was non-monotonic, maximized at the middle of the range of activity. It would be more biologically plausible to assume a monotonic increase (while an opposite shape of nonmonotonicity, once decrease and thereafter increase, called the BCM (Bienenstock-Cooper-Munro) rule has actually been suggested [56-58]).”

      (12) Line 363: This is the most exciting part of the paper (for me). I want to learn way more about this! Don't hide this in a few sentences. I want to know all about loose vs. feedback alignment. Show visualizations in 3D space of the idea of loose alignment (starting in the same quadrant), and compare it to how feedback alignment develops (ending in the same quadrant). Does this "loose" alignment idea give us an idea why the random feedback seems to settle at 45 degree angle? it just needs to get the signs right (same quadrant) for each element?

      In reply to this encouraging comment, we have made further analyses of the loose alignment. By the term "loose alignment", we meant that the value weight vector w and the feedback vector c are in the same (non-negative) quadrant, as you said. But what remained mysterious (to us) was while the angle between w and c was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the nonnegative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      As for visualization, because the model's dimension was high such as 12, we could not come up with better ways of visualization than the trial versus angle plot (Fig. 3A, 8A,F). Nevertheless, we would expect that the abovementioned additional analyses of loose alignment (with graphs) are useful to understand what are going on.

      (13) Line 426: how does this compare to some of the reward modulated hebbian rules proposed in other RNNs? See Hoerzer, G. M., Legenstein, R., & Maass, W. (2014). Put another way, you arrived at this from a top-down approach (gradient descent->BP->approximated by RF->non-negativity constraint>leads to DA dependent modulation of Hebbian plasticity). How might this compare to a bottom up approach (i.e. starting from the principle of Hebbian learning, and adding in reward modulation)

      The study of Hoerzer et al. 2014 used a stochastic perturbation, which we did not assume but can potentially be integrated. On the other hand, Hoerzer et al. trained the readout of untrained RNN, whereas we trained both RNN and its readout. We have added discussion to compare our model with Hoerzer et al. and other works that also used perturbation methods, as well as other top-down approximation method, in Line 685-711 (reference 128 is Hoerzer et al. 2014 Cereb Cortex):

      “As an alternative to backprop in hierarchical network, aside from feedback alignment [36], Associative Reward-Penalty (A<sub>R-P</sub>) algorithm has been proposed [124-126]. In A<sub>R-P</sub>, the hidden units behave stochastically, allowing the gradient to be estimated via stochastic sampling. Recent work [127] has proposed Phaseless Alignment Learning (PAL), in which high-frequency noise-induced learning of feedback projections proceeds simultaneously with learning of forward projections using the feedback in a lower frequency. Noise-induced learning of the weights on readout neurons from untrained RNN by reward-modulated Hebbian plasticity has also been demonstrated [128]. Such noise- or perturbation-based [40] mechanisms are biologically plausible because neurons and neural networks can exhibit noisy or chaotic behavior [129-131], and might improve the performance of value-RNN if implemented.

      Regarding learning of RNN, "e-prop" [35] was proposed as a locally learnable online approximation of BPTT [27], which was used in the original value RNN 26. In e-prop, neuron-specific learning signal is combined with weight-specific locally-updatable "eligibility trace". Reward-based e-prop was also shown to work [35], both in a setup not introducing TD-RPE with symmetric or random feedback (their Supplementary Figure 5) and in another setup introducing TD-RPE with symmetric feedback (their Figure 4 and 5). Compared to these, our models differ in multiple ways.

      First, we have shown that alignment to random feedback occurs in the models driven by TD-RPE. Second, our models do not have "eligibility trace" (nor memorable/gated unit, different from the original valueRNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]). However, as mentioned before, single time-step in our models was assumed to correspond to hundreds of milliseconds, incorporating slow synaptic dynamics, whereas e-prop is an algorithm for spiking neuron models with a much finer time scale. From this aspect, our models could be seen as a coarsetime-scale approximation of e-prop. On top of these, our results point to a potential computational benefit of biological non-negative constraint, which could effectively limit the parameter space and promote learning.”

      Related to your latter point (and also replying to other reviewer's comment), we also examined the cases where the random feedback in our model was replaced with uniform feedback, which corresponds to a simple bottom-up reward-modulated triplet plasticity rule. As a result, the model with uniform feedback showed largely comparable, but somewhat worse, performance than the model with random feedback. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1)<sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN. and also added a biological implication of the results in Line 644-652:

      We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (postsynaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      We have examined the cases where the feedback was uniform, i.e., in the direction of (1, 1, ..., 1) in both models without and with non-negative constraint. In both models, the models with uniform feedback performed somewhat worse than the original models with random feedback, but still better than the models with untrained RNN. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN.”

      We have also added a discussion on the biological implication of the model with uniform feedback mentioned in our provisional reply in Line 644-652:

      “We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      In addition, while preparing the revised manuscript, we found a recent simulation study, which showed that uniform feedback coupled with positive forward weights was effective in supervised learning of one-dimensional output in feed-forward network (Konishi et al., 2023, Front Neurosci).

      We have briefly discussed this work in Line 653-655:

      “Notably, uniform feedback coupled with positive forward weights was shown to be effective also in supervised learning of one-dimensional output in feed-forward network [114], and we guess that loose alignment may underlie it.”

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      We have added a discussion on the prediction of our models, mentioned in our provisional reply, in Line 627-638:

      “oVRNNrf predicts that the feedback vector c and the value-weight vector w become gradually aligned, while oVRNNrf-bio predicts that c and w are loosely aligned from the beginning. Element of c could be measured as the magnitude of pyramidal cell's response to DA stimulation. Element of w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell. Then, the abovementioned predictions could be tested by (i) identify cortical, striatal, and VTA regions that are connected, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether DA→pyramidal responses and pyramidal→striatal responses are associated across pyramidal cells, and whether such associations develop through learning.”

      Moreover, we have considered another (technically more doable) prediction of our model, and described it in Line 639-643:

      “Testing this prediction, however, would be technically quite demanding, as mentioned above. An alternative way of testing our model is to manipulate the cortical DA feedback and see if it will cause (re-)alignment of value weights (i.e., cortical striatal strengths). Specifically, our model predicts that if DA projection to a particular cortical locus is silenced, effect of the activity of that locus on the value-encoding striatal activity will become diminished.”

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task? [1] https://www.nature.com/articles/s41467-020-17236-y

      As for a specific feature of non-negative models, we did not describe (actually did not well recognize) an intriguing result that the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left (please mind the difference in the vertical scales)). This suggests that the non-negative constraint effectively limited the parameter space and thereby learning became efficient. We have added this result in Line 392-395:

      “Remarkably, oVRNNrf-bio generally achieved better performance than both oVRNNbp and oVRNNrf, which did not have the non-negative constraint (Wilcoxon rank sum test, vs oVRNNbp : p < 7.8×10,sup>−6</sup> for 5 or ≥25 RNN units; vs oVRNNrf: p < 0.021 for ≤10 or ≥20 RNN units).”

      Also, in the models with non-negative constraint, the model with random feedback learned more rapidly than the model with backprop although they eventually reached a comparable level of errors, at least in the case with 20 RNN units. This is presumably because the value weights did not develop well in early trials and so the backprop-based feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning. We have added this result in Fig. 6I and Line 417-422:

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      We have also added a discussion on how our model can be positioned in relation to other models including the study you mentioned (e-prop by Bellec, ..., Maass, 2020) in subsection “Comparison to other algorithms” of the Discussion):

      Regarding the slightly better performance of the non-negative model with random feedback than that of the non-negative model with backprop when the number of RNN units was large (mentioned in our provisional reply), state values in the backprop model appeared underdeveloped than those in the random feedback model. Slightly better performance of random feedback than backprop held also in our extended model incorporating excitatory and inhibitory units (Fig. 9B).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In the cue-reward association task with 3 time-steps delay, the non-negative model with random feedback performed largely comparably to the non-negative model with backprop, and this remained to hold in a task where distractor cue, which was not associated with reward, appeared in random timings. We have added the results in Fig. 10 and subsection “4.2 Task with distractor cue”.

      We have also examined the cases where the cue-reward delay was elongated. In the case of longer cue-reward delay (6 time-steps), in the models without non-negative constraint, the model with random feedback performed comparably to (and slightly better than when the number of RNN units was large) the model with backprop (Fig. 2M). In contrast, in the models with non-negative constraint, the model with random feedback underperformed the model with backprop (Fig. 6J, left-bottom). This indicates a difference between the effect of non-negative random feedback and the effect of positive+negative random feedback.

      We have further examined the performance of the models in terms of action selection, by extending the models to incorporate an actor-critic algorithm. In a task with inter-temporal choice (i.e., immediate small reward vs delayed large reward), the non-negative model with random feedback performed worse than the non-negative model with backprop when the number of RNN units was small. When the number of RNN increased, these models performed more comparably. These results are described in Fig. 11 and subsection “4.3 Incorporation of action selection”.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      As for 7a), 'CSC (complete serial compound)' was actually not the name of the task but the name of the 'punctate' state representation, in which each state (timing from cue) is represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), ..., and (0, 0, ..., 1). As you pointed out, using the name of 'CSC' would make the text appearing more technical than it actually is, and so we have moved the reference to the name of 'CSC' to the Methods (Line 903-907):

      “For the agents with punctate state representation, which is also referred to as the complete serial compound (CSC) representation [1, 48, 133], each timing from a cue in the tasks was represented by a 10-dimensional one-hot vector, starting from (1 0 0 ... 0)<sup>T</sup> for the cue state, with the next state (0 1 0 ... 0) <sup>T</sup> and so on.”

      and in the Results we have instead added a clearer explanation (Line 163-165):

      “First, for comparison, we examined traditional TD-RL agent with punctate state representation (without using the RNN), in which each state (time-step from a cue) was represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), and so on.”

      As for 7b), we have added the rationale for our examination of the tasks with probabilistic structures (Line 282-294):

      “Previous work [54] examined the response of DA neurons in cue-reward association tasks in which reward timing was probabilistically determined (early in some trials but late in other trials). There were two tasks, which were largely similar but there was a key difference that reward was given in all the trials in one task whereas reward was omitted in some randomly determined trials in another task. Starkweather et al. [54] found that the DA response to later reward was smaller than the response to earlier reward in the former task, presumably reflecting the animal's belief that delayed reward will surely come, but the opposite was the case in the latter task, presumably because the animal suspected that reward was omitted in that trial. Starkweather et al.[54] then showed that such response patterns could be explained if DA encoded TD-RPE under particular state representations that incorporated the probabilistic structures of the task (called the 'belief state'). In that study, such state representations were 'handcrafted' by the authors, but the subsequent work [26] showed that the original value-RNN with backprop (BPTT) could develop similar representations and reproduce the experimentally observed DA patterns.”

      As for 7c), we have extensively revised the text of the results, adding high-level explanations while trying to reduce the lengthy low-level descriptions (e.g., Line 172-177 for Fig2E-G).

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      There is actually an unexpected finding with non-negative model: the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left), presumably because the nonnegative constraint effectively limited the parameter space and thereby learning became efficient, as we mentioned in our reply to your point 6a above (we did not well recognize this at the time of original submission).

      Another potential merit of our present work is the simplicity of the model and the task. This simplicity enabled us to derive an intuitive explanation on why feedback alignment could occur. Such an intuitive explanation was lacking in previous studies while more precise mathematical explanations did exist. Related to the mechanism of feedback alignment, one thing remained mysterious to us at the time of original submission. Specifically, in the non-negatively constraint random feedback model, while the angle between the value weight (w) and the random feedback (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Correction of an error in the original manuscript

      In addition to revising the manuscript according to your comments, we have made a correction on the way of estimating the true state values. Specifically, in the original manuscript, we defined states by relative time-steps from a reward and estimated their values by calculating the sums of discounted future rewards starting from them through simulations. However, we assumed variable inter-trial intervals (ITIs) (4, 5, 6, or 7 time-steps with equal probabilities), and so until receiving cue information, agent should not know when the next reward will come. Therefore, states for the timings up to the cue timing cannot be defined by the upcoming reward, but previously we did so (e.g., state of "one timestep before cue") without taking into account the ITI variability.

      We have now corrected this issue, having defined the states of timings with respect to the previous (rather than upcoming) reward. For example, when ITI was 4 time-steps and agent existed in its last time-step, agent will in fact receive a cue at the next time-step, but agent should not know it until actually receiving the cue information and instead should assume that s/he was at the last time-step of ITI (if ITI was 4), last − 1 (if ITI was 5), last − 2 (if ITI was 6), or last − 3 (if ITI was 7) with equal probabilities (in a similar fashion to what we considered when thinking about state definition for the probabilistic tasks). We estimated the true values of states defined in this way through simulations. As a result, the corrected true value of the cue-timing has become slightly smaller than the value described in the original manuscript (reflecting the uncertainty about ITI length), and consequently small positive TD-RPE has now appeared at the cue timing.

      Because we measured the performance of the models by squared errors in state values, this correction affected the results reporting the performance. Fortunately, the effects were relatively minor and did not largely alter the results of performance comparisons. However, we sincerely apologize for this error. In the revised manuscript, we have used the corrected true values throughout the manuscript, and we have described the ways of estimating these values in Line 919-976.

    1. eLife Assessment

      This important manuscript presents a thorough analysis of trans-specific polymorphism (TSP) in Major Histocompatibility Complex gene families across primates. The analysis makes the most of currently available genomic data and methods to substantially increase the amount and evolutionary time that TSPs can be observed. Both false negative TSPs due to missing genes at the assembly and/or annotation level, as well as false positives due to read mismapping with missing paralogs, are well assessed and discussed. Overall the evidence provided is compelling, and the manuscript clearly delineates the path for future progress on the topic.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      Comments on revisions:

      The authors have sufficiently addressed the reviewers' comments or provided additional details justifying their work. In particular, expansion of the discussion section on limitations of the analysis and clearer reference to how this relates to their companion paper represent improvements. Remaining suggestions are to still make clearer how much sparsity of sequences in the database may impact the conclusions (e.g., is this more of a problem for some genes or taxa than others? Is it a small problem or a large problem?). The data summary tables are a bit hard to read and seem to contain some information not used in the article - maybe the presentation of these could be improved or the full details, or a shorter table summer in the main paper and full details only in the supplement.

    3. Reviewer #3 (Public review):

      Summary:

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provide evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths:

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses:

      Following the revision by the Authors I see mostly one weakness - Older literature on the subject is duly cited, but the discussion of the findings the context of this literature is limited.

      Comments on revisions:

      Lines 441-452 - In this section, you discuss an apparent paradox between long-lived balancing selection and strong directional selection, referencing elevated substitution rates. However, this issue is more nuanced and may not be best framed in terms of substitution rates. That terminology is common in phylogenetic analyses, where differences between sequences-or changes along phylogenetic branches-are often interpreted as true substitutions in the population genetic sense. In the case of MHC trees and the rates you're discussing here, the focus is more accurately on the rate at which new mutations become established within particular allelic lineages. So while this still concerns evolutionary rates at specific codons, equating them directly with substitution rates may be misleading. A more precise term or framing might be warranted in this context.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      We agree that context is important! Although we expected to get the most interesting results from studying the classical genes, we did include the non-classical genes specifically for comparison. They are located in the same genomic region, have multiple sequences catalogued in different species (although they are less diverse), and perform critical immune functions. We think this is a more appropriate set to compare with the classical MHC genes than, say, a random set of genes. Interestingly, we did not detect TSP in these non-classical genes. This likely means that the classical MHC genes are truly exceptional, but it could also mean that not enough sequences are available for the non-classical genes to detect TSP. 

      It would be very interesting to repeat this analysis for another gene family to see whether such deep TSP also occurs in other immune or non-immune gene families. We are lucky that decades of past work and a dedicated database exists for cataloging MHC sequences. When this level of sequence collection is achieved for other highly polymorphic gene families, it will be possible to do a comparable analysis.  

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      We were not able to infer TSP rates conditional on rates of gene gain/loss. We agree that some cases of TSP were likely lost due to the loss of a gene paralog from certain species. Furthermore, the dearth of MHC whole-region and allele sequences available for most primates makes it difficult to detect TSP, even if the gene paralog is still present. Long-read sequencing of more primate genomes should help with this. We agree that it would also be very interesting to study TSPs that were maintained for millions of years but were lost recently.

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      We agree that a linear model is likely not the most biologically reasonable choice, as protein interactions are complex. However, we made the choice to implement the simplest model because the evolutionary rates we inferred were relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

      To strengthen this claim, we added Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the eLife template does not allow). Here, we plot the number of associations for each amino acid against evolutionary rate, revealing a significant positive slope in Class I. We also added explanatory text for this figure in lines 400-404.

      Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534).  We also added text (lines 216-219 and 250-252) to more explicitly point out that our method is conservative when few sequences are available.

      We also added a paragraph to the discussion which addresses data quality and mismapping issues (lines 473-499).

      We clarified the role of our companion paper (line 49-50) by changing “In our companion paper, we explored the relationships between the different classical and non-classical genes” to “In our companion paper, we built large multi-gene trees to explore the relationships between the different classical and non-classical genes.” We also changed the text in lines 97-99 from “In our companion paper, we compared genes across dozens of species and learned more about the orthologous relationships among them” to “In our companion paper, we built trees to compare genes across dozens of species. When paired with previous literature, these trees helped us infer orthology and assign sequences to genes in some cases.”

      Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):

      -  Inadequate description and presentation of the data used

      -  Large parts of the results read like extended figure captions, which breaks the flow. - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.

      -  The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

      We address these comments in the more detailed section below.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The abstract could benefit from being sharpened. A personal pet peeve is a common habit of saying we don't know everything about a topic (line 16 - "lack a full picture of primate MHC evolution"); We never know everything on a topic, so this is hardly a strong rationale to do more work on it. This is followed by "to start addressing this gap" - which is vague because you haven't explicitly stated any gap, you simply said we are not yet omniscent on the topic. Please clearly identify a gap in our knowledge, a question that you will be able to answer with this paper.

      That makes sense! We added another sentence to the abstract to make the specific gap clearer. Inserted “In particular, we do not know to what extent genes and alleles are retained across speciation events” in lines 16-17.

      Reviewer #2 (Recommendations for the authors):

      - Some discussion of alternative explanations when certain comparisons were not found to have TSP - is this consistent with genetic drift sometimes leading to lineage loss, or does it suggest that the proposed tradeoff between autoimmunity and pathogen recognition might differ depending on primates' life history and/or exposure to similar pathogens? Could the trade-off of pathogen to self-recognition not be as costly in some species?

      This is consistent with genetic drift, as no lineages are expected to be maintained across these distantly-diverged primates under neutral selection. These ideas are certainly possible, but our Bayes Factor test only reveals evidence (or lack thereof) for deviations from the species tree and cannot provide reasons why or why not.

      - It would be interesting to put these results on very long-term balancing selection in the context of what has been reported at the region for shorter term balancing selection. The discussion compares findings of previous genes in the literature but not regarding the time scale.

      Indeed, there is some evidence for the idea of “divergent allele advantage”, in which MHC-heterozygous individuals have a greater repertoire of peptides that they can present, leading to greater resistance against pathogens and greater fitness. This heterozygote advantage thus leads to balancing selection (Pierini and Lenz, 2018; Chowell et al., 2019). Our discussion mentions other time scales of balancing selection across the primates at the MHC and other loci, but we choose to focus more on long-term than short-term balancing selection.

      - Lines 223-226 - how is the difference in BF across exons in MHC-A to be interpreted? The paragraph is about MHC-A, but then the explanation in the last sentence is for when similar BF are observed which is not the case for MHC-A. Is this interpreted as lack of evidence for TSP? Or something about recombination or gene conversion? Or that one exon may be under balancing selection but not the other?

      Thank you for pointing out the confusing logic in this paragraph. 

      Previous: “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Many sequences had to be excluded from MHC-A comparisons because they were identified as gene-converted in the \textit{GENECONV} analysis or were previously identified as recombinants \citep{Hans2017,Gleimer2011,Adams2001}. Importantly, for MHC-A we do not see concordance in Bayes factors across the different exons, whereas we do for the other gene groups. Similar Bayes factors across all exons for a given comparison is thus evidence in favor of TSP being the primary driver of the observed deep coalescence structure (rather than recombination or gene conversion).” Current (lines 228-238): 

      “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Past work suggests that this gene has had a long history of gene conversion affecting different exons, resulting in different evolutionary histories for different parts of the gene \citep{Hans2017,Gleimer2011,Adams2001}. Indeed, we excluded many MHC-A sequences from our Bayes factor calculations because they were identified as gene-converted in our \textit{GENECONV} analysis or were previously suggested to be recombinants. As shown in \FIG{bayes_factors_classI}, the lack of concordance in Bayes factors across the different exons for MHC-A is evidence for gene conversion, rather than balancing selection, being the most important factor in this gene's evolution. In contrast, the other gene groups generally show concordance in Bayes factors across exons. We interpret this as evidence in favor of TSP being the primary driver of the observed deep coalescence structure for MHC-B and -C (rather than recombination or gene conversion).”

      - In Figures 5C and 6C, the points sometimes show a kind of smile pattern of possibly higher rates further from the peptide. Did authors explore other fits like a polynomial? Or, whether distance only matters in close proximity to the peptide? Out of curiosity, is it possible to map substitution time/branch into the distance to the peptide binding region for each substitution? Is there any pattern with distance to interacting proteins in non-peptide binding MHC proteins like MHC-DOA? Although they don't have a PBR they do interact with other proteins.

      Thank you for these ideas! We did not explore other fits, such as a polynomial, because we wanted to implement the simplest model. Our evolutionary rates are relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      There is most likely a relationship between evolutionary rate and the distance to interacting proteins in the non-peptide-binding molecules MHC-DM and -DO. However, there are few currently available models and it is difficult to determine which residues in these models are actually interacting. However, researchers with more experience in protein interactions would be able to undertake such an analysis. 

      - How biased is the database towards human alleles? Could this affect some of the analyses, including the coincidence of rapidly evolving sites with associations? Are there more associations than expected under some null model?

      While the database is indeed biased toward human alleles, we included only a small subset of these in order to create a more balanced data set spanning the primates. This is unlikely to affect the coincidence of rapidly-evolving sites with associations; however, we note that there are no such association studies meeting our criteria in other species, meaning the associations are only coming from studies on humans.

      - To this reader, it is unnecessary and distracting to describe the figures within the text; there are frequent sentences in the text that belongs in the figure legend instead (e.g., lines 139-143, 208-211, 214-215, 328-330, etc). It would be better to focus on the results from the figures and then cite the figure, where the colors and exactly what is plotted can be in the figure legend.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      - I'm still concerned that the poor mappability of short-read data is contributing in some ways. Were the sequences in the database mostly from long-reads? Was nucleotide diversity calculated directly from the sequences in the database or from another human dataset? Is missing data at some sites accounted for in the denominator?

      The sequences in the database are mostly from short reads and come from a wide array of labs. We have added a paragraph to the discussion to explain the limitations of this (lines 473-499). However, the nucleotide diversity calculations shown in Figure 1 do not rely on the MHC database; rather, they are calculated from the human genomes in the 1000 Genomes project. Nucleotide diversity would be calculable for other species, but we did not do so for exactly the reason you mention–too much missing data.

      - The Figure 2 and Figure 3 supplements took me a little bit to understand - is it really worth pointing out the top 5 Bayes-factor comparisons when there is no evidence for TSP? A lot of the colored squares are not actually supporting TSP but in the grids you can't see which are and which aren't without looking at the Bayes Factor. I wonder if it would help if only those with BF > 100 were shown? Or if these were marked some other way so that it was easy to see where TSPs are supported.

      Thank you for your perspective on these figures! We initially limited them to only show >100 Bayes factors for each gene group and region, but some gene groups have no high Bayes factors. Additionally, the “summary” tree pictured in these figures is necessarily a simplification of the full space of posterior trees. We felt that showing low Bayes factor comparisons could help readers understand this relationship. For example, allele sets that look non-monophyletic on the summary tree may still have a low Bayes factor, showing that they are generally monophyletic throughout the larger (un-visualizable) space of trees.

      Reviewer #3 (Recommendations for the authors):

      Specific comments

      Abstract

      I think the abstract would benefit from some editing. For example, one might get the impression that you equate allele sharing, which would normally be understood as sharing identical sequences, with sharing ancestral allelic lineages. This distinction is important because you can have many TSPs without sharing identical allele sequences. In l. 20 you write about "deep TSP", which requires either definition of reformulation. In l. 21-23 you seem to suggest that long-term retention of allelic lineages is surprising in the light of rapid sequence evolution - it may be, depending on the evolutionary scenarios one is willing to accept, but perhaps it's not necessary to float such a suggestion in the abstract where it cannot be properly explained due to space constraints? The last sequence needs a qualifier like "in some cases".

      Thank you for catching these! For clarity, we changed several words:

      ● “alleles” to “allelic lineages” in line 13

      ● “deep” to “ancient” in line 21

      ● “Despite” to “in addition to” in line 22

      ● Added “in some cases” to line 28

      Results - Overall, parts of the results read like extended figure captions. I understand that the authors want to make the complex figures accessible to the reader. However, including so much information in the text disrupts the flow and makes it difficult to follow what the main findings and conclusions are.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      l. 37-39 such a short sentence on non-classical MHC is necessarily an oversimplification, I suggest it be expanded or deleted.

      There is certainly a lot to say about each of these genes! While we do not have space in this paper’s introduction to get into these genes’ myriad functions, we added a reference to our companion paper in lines 40-41:

      “See the appendices of our companion paper \citep{Fortier2024a} for more detail.”

      These appendices are extensive, and readers can find details and references for literature on each specific gene there. In addition, several genes are mentioned in analyses further on in the results, and their specific functions are discussed in more detail when they arise.

      l. 47 -49 It would be helpful to briefly outline your criteria for selecting these 17 genes, even if this is repeated later.

      Thank you! For greater clarity, we changed the text (lines 50-52) from “Here, we look within 17 specific genes to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.” to “Here, we look within 17 specific genes---representing classical, non-classical, Class I, and Class II ---to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.“  

      l.85-87 I may be completely wrong, but couldn't problems with establishing orthology in some cases lead to false inferences of TSP, even in primates? Or do you think the data are of sufficient quality to ignore such a possibility? (you touch on this in pp. 261-264)

      Yes, problems with establishing orthology can lead to false inferences of TSP, and it has happened before. For example, older studies that used only exon 2 (binding-site-encoding) of the MHC-DRB genes inferred trees that grouped NWM sequences with ape and OWM sequences. Thus, they named these NWM genes MHC-DRB3 and -DRB5 to suggest orthology with ape/OWM MHC-DRB3 and -DRB5, and they also suggested possible TSP between the groups. However, later studies that used non-binding-site-encoding exons or introns noticed that these NWM sequences did not group with ape/OWM sequences (which now shared the same name), providing evidence against orthology. This illustrates that establishing orthology is critical before assessing TSP (as is comparing across regions). This is part of the reason we published a companion paper (https://doi.org/10.7554/eLife.103545.1), which clears up questions of orthology and supports the analyses we did in this paper. In cases where orthology was ambiguous, this also helped us to be conservative in our conclusions here. The problems with ambiguous gene assignment are also discussed in lines 488-499.

      l. 88-93 is the first place (others are pp. 109-118 and 460-484) where a fuller description of the data used would be welcome. It's clear that the amount of data from different species varies enormously, not only in the number of alleles per locus, but also in the loci for which polymorphism data are available. In such a synthesis study, one would expect at least a tabulation of the data used in the appendices and perhaps a summary table in the main article.

      l. 109-118 Again, a more quantitative summary of the data used, with reference to a table, would be useful.

      Thank you! To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534). Supplementary Files listing the exact alleles and sequences used in each group are also included in the resubmission.

      l. 123-124 here you say that the definition of the "16 gene groups" is in the methods (probably pp. 471-484), but it would be useful to present an informative summary of your rationale in the introduction or here

      Thank you! We agree that it is helpful to outline these groups earlier. We have changed the paragraph in lines 123-135 from: 

      “We considered 16 gene groups and two or three different genic regions for each group: exon 2 alone, exon 3 alone, and/or exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. See the Methods for more detail on how gene groups were defined. Because few intron sequences were available for non-human species, we did not include them in our analyses.” To: 

      “We considered 16 gene groups spanning MHC classes and functions. These include the classical Class I genes (MHC-A-related, MHC-B-related, MHC-C-related), non-classical Class I genes (MHC-E-related, MHC-F-related, MHC-G-related), classical Class IIA genes (MHC-DRA-related, MHC-DQA-related, MHC-DPA-related), classical Class IIB genes (MHC-DRB-related, MHC-DQB-related, MHC-DPB-related), non-classical Class IIA genes (MHC-DMA-related, MHC-DOA-related, and non-classical Class IIB genes (MHC-DMB-related, MHC-DOB-related). We studied two or three different genic regions for each group: exon 2 alone, exon 3 alone, and (for Class I) exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. Because few intron sequences were available for non-human species, we did not include them in our analyses.”

      l. 100 "alleles" -> "allelic lineages"

      Thank you for catching this. We have changed this language in line 104.

      l. 227-238 it's important to discuss the possible effect of the number of sequences available on the detectability of TSP - this is particularly important as the properties of MHC genealogies may differ considerably from those expected for neutral genealogies.

      This is a good point that may not be obvious to readers. We have added several sentences to clarify this:

      Line 193-194: “In a neutral genealogy, monophyly of each species' sequences is expected.”

      Line 213-219: “Note that the number of sequences available for comparison also affects the detectability of TSP. For example, if the only sequences available are from the same allelic lineage, they will coalesce more recently in the past than they would with alleles from a different lineage and would not show evidence for TSP. This means our method is well-suited to detect TSP when a diverse set of allele sequences are available, but it is conservative when there are few alleles to test. There were few available alleles for some non-classical genes, such as MHC-F, and some species, such as gibbon.”

      Line 244-246: “However, since there are fewer alleles available for the non-classical genes, we note that our method is likely to be conservative here.”

      l. 301 and 624-41 it's been difficult for me to understand the rationale behind using rates at mostly gap positions as the baseline and I'd be grateful for a more extensive explanation

      Normalizing the rates posed a difficult problem. We couldn’t include every single sequence in the same alignment because BEAST’s computational needs scale with the number of sequences. Therefore, we had to run BEAST separately on smaller alignments focused on a single group of genes at a time. We still wanted to be able to compare evolutionary rates across genes, but because of the way SubstBMA is implemented, evolutionary rates are relative, not absolute. Recall that to help us compare the trees, we included a common set of “backbone” sequences in all of the 16 alignments. This set included some highly-diverged genes. Initially, we planned to use 4-fold degenerate sites as the baseline sites for normalization, but there simply weren’t enough of them once we included the “backbone” set on top of the already highly diverse set of sequences in each alignment. This diversity presented an opportunity.  In BEAST, gaps are treated as missing and do not contribute any probability to the relevant branch or site (https://groups.google.com/g/beast-users/c/ixrGUA1p4OM/m/P4R2fCDWMUoJ?pli=1). So, we figured that sites that were “mostly gap” (a gap in all the human backbone sequences but with an insertion in some sequence) were mostly not contributing to the inference of the phylogeny or evolutionary rates. Because the “backbone” sequences are common to all alignments, making the “mostly gap” sites somewhat comparable across sets while not affecting inferred rates, we figured they would be a reasonable choice for the normalization (for lack of a better option).

      We added text to lines 680 and 691-693 to clarify this rationale.

      l. 380-84 this overview seems rather superficial. Would it be possible to provide a more quantitative summary?

      To make this more quantitative, we plotted the number of associations for each amino acid against evolutionary rate, shown in Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the template does not allow). This reveals a significant positive slope for the Class I genes, but not for Class II. We also added explanatory text for this figure in lines 400-404.

      Discussion - your approach to detecting TSP is elegant but deserves discussion of its limitations and, in particular, a clear explanation of why detecting TSP rather than quantifying its extent is more important in the context of this work. Another important point for discussion is alternative explanations for the patterns of TSP or, more broadly, gene tree - species tree discordance. Although long-term maintenance of allelic lineages due to long-term balancing selection is probably the most convincing explanation for the observed TSP, interspecific introgression and incorrect orthology assessment may also have contributed, and it would be good to see what the authors think about the potential contribution of these two factors.

      Overall, our goal was to use modern statistical methods and data to more confidently assess how ancient the TSP is at each gene. We have added several lines of text (as noted elsewhere in this document) to more clearly illustrate the limitations of our approach. We also agree that interspecific introgression and incorrect orthology assessment can cause similar patterns to arise. We attempted to minimize the effect of incorrect orthology assessment by creating multi-gene trees and exploring reference primate genomes, as described in our companion paper (https://doi.org/10.7554/eLife.103545.1), but cannot eliminate it completely. We have added a paragraph to the discussion to address this (lines 488-499). Interspecific introgression could also cause gene tree-species tree discordance, but we are not sure about how systematic this would have to be to cause the overall patterns we observe, nor about how likely it would have been for various clades of primates across the world.

      l. 421 -424 A more nuanced discussion distinguishing between positive selection, which facilitates the establishment of a mutation, and directional selection, which leads to its fixation, would be useful here.

      We added clarification to this sentence (line 443-445), from “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate.” to “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate, generating ample mutations upon which selection may act.”

      l. 432-434 You write here about the shaping of TCR repertoires, but I couldn't find any such information in the paper, including Table 1.

      We did not include a separate column for these, so they can be hard to spot. They take the form of “TCR 𝛽 Interaction Probability >50%”, “TCR Expression (TRAV38-1)”, or “TCR 𝛼 Interaction Probability >50%” and can be found in Table 1.

      l. 436-442 Here a more detailed discussion in the context of divergent allelic advantage and even the evolution of new S-type specificities in plants would be valuable.

      We added an additional citation to a review article to this sentence (lines 438-439).  

      l. 443 The use of the word "training" here is confusing, suggesting some kind of "education" during the lifetime of the animal.

      We agree that “train” is not an entirely appropriate term, and have changed it to “evolve” (line 465).

      489-491 What data were used for these calculations?

      Apologies for missing this citation! We used the 1000 genomes project data, and the citation has been updated (line 541-542).

    1. eLife Assessment

      This study reports valuable findings on the role of Layilin in the motility and suppressive capacity of clonal expanded regulatory T cells (Tregs) in the skin. Although the strength of the study is utilizing conditional knock-out mice and human skin samples, the analysis of the molecular mechanism by which Layilin affects Treg function is incomplete. The study will be of interest to medical scientists working on skin immunology.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      This work shows that the gene encoding Layilin is expressed preferentially in human skin Tregs, and that the fraction of Tregs expressing Layilin may overexpress genes related to T cell activation and adhesion. Expression of Layilin on Tregs would have no impact on activation markers or in vitro suppressive function. However, activation of Layilin either with a cross-linking antibody or collagen IV, its natural ligand, would promote cell adhesion via LFA1 activation. The in vivo functional role of Layilin in Tregs is studied in a conditional KO mouse model in a model of skin inflammation. Deletion of Layilin in Tregs led to an attenuation of the disease score and a reduction in the cutaneous lymphocyte infiltrate. This work is clearly innovative, but a number of major points limit its interest.

      Weakness and major points:

      (1) The number of panels and figures suggests that this story is quite complete but several data presented in the main figures do not provide essential information for a proper understanding of Layilin's role in Tregs.

      Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      (2) Some important data are not shown or not mentioned.

      (a) It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.<br /> (b) We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.<br /> (c) For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.<br /> (d) For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      (3) For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript, Gouirand et al. report on the role of Layilin expression for the motility and suppressive capacity of regulatory T cells (Tregs). In previous studies, the authors had already demonstrated that Layilin is expressed on Tregs, that it acts as a negative regulator of their suppressive capacity, that it functions to anchor Tregs in non-lymphoid tissues, and that it enhances the adhesive properties of Layilin-expressing cells by co-localization with the integrin αLβ2 (LFA-1). Building on these published data, the authors now show that Layilin is highly expressed on a subset of clonally expanded effector Tregs in both healthy and psoriatic skin and that deletion of Layilin in Tregs in vivo resulted in significantly attenuated skin inflammation. Furthermore, the authors addressed the molecular mechanism by which Layilin affects the suppressive capacity of Tregs and showed that Layilin increased Treg adhesion via modulation of LFA-1, resulting in distinct cytoskeletal changes.

      Strengths:

      Certainly, the strength of this study lies in the combination of data from mouse and human models.

      Weaknesses:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

    4. Reviewer #3 (Public review):

      Summary:

      Gouirand et al explore the function of Layilin on Treg in the context of psoriasis using both patient samples and a conditional mutant mouse model. They perform functional analysis in the patient samples using Cas9-mediated deletion. The authors suggest that Layilin works in concert with integrins to bind collagen IV to attenuate cell movement.

      The work is well done and built on solid human data. The report is a modest advance from the authors' previous report in 2021 that focused on tumor responses, with this report focusing on psoriasis. There are some experimental concerns that should be considered.

      Strengths:

      (1) Good complementation of patient and animal model data.

      (2) Solid experimentation using state-of-the-art approaches.

      (3) There is clearly a biological effect of LAYN deficiency in the mouse model.

      (4) The report adds some new information to what was already known from the previous reports.

      Weaknesses:

      (1) It is not clear that the assays used for functional analysis of the patient samples were optimal.

      (2) Several conclusions are not fully substantiated.

      (3) The report is lacking some experimental details.

    5. Author response:

      Reviewer 1:

      Concern 1: Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      We thank you for your recommendations. As rearranging figures is not critical to convey the data, we have decided to keep the figures and supplemental figures as they are currently presented.

      Concern 2a: It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.

      This data is published in a previous manuscript from our group. Please see Figure 1 in “Layilin Anchors Regulatory T Cells in Skin” (PMID: 34470859)

      Concern 2b: We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.

      Given that we have already shown that layilin plays a major role in Treg and CD8+ T cell adhesion in tissues, we used a candidate approach for our GSEA. We tested the hypothesis that adhesion and motility pathways are enriched in Layilin-expressing Tregs. There was a statistically significant enrichment for these genes in Layilin+ Tregs compared to Layilin- Tregs, which we feel adequately tests our hypothesis.

      Concern 2c: For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.

      We respect this concern. We omit these secondary to space constraints.

      Concern 2d: For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      We respectfully disagree. Three doners were used in a paired fashion (internally controlled) achieving statistical significance.

      Concern 3: For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

      We agree. However, as the reviewer points out, these experiments are not logistically and practically feasible at this point. We do perform several experiments in this manuscript in which layilin is reduced via gene editing with results supporting our hypotheses.

      Reviewer 2:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

      We have been transparent with all our methods and data. We will leave this to the reader to determine level of rigor and the robustness of the data.

      Reviewer 3:

      Weaknesses:

      It is not clear that the assays used for functional analysis of the patient samples were optimal. (2) Several conclusions are not fully substantiated. (3) The report is lacking some experimental details.

      We have tried to be as comprehensive and thorough as possible. We feel that the data supports our conclusions. We will leave this to the reader to interpret and conclude.

    1. eLife Assessment

      This revised study describes an important new model for in vivo manipulation of microglia, exploring how mutations in the Adar1 gene within microglia contribute to Aicardi-Goutières Syndome. The methodology is validated with exceptional data, supporting the authors' conclusions. The paper underscores both the advantages and limitations of using transplanted cells as a surrogate for microglia, making it a resource that is of value for biologists studying macrophages and microglia.

    2. Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

    3. Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gain tractions in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provide evidence to support: (1) Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competences and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Weaknesses:

      The robust data showing the quality of this model at the transcriptomic level can be strengthened with confirmation at protein and functional levels. The authors were unable to investigate the effects of Adar1-KO using ER-Hoxb8 cells and instead had to rely on a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H). Additionally, ER-Hoxb8-derived microglia do not express Sall1, a key marker of microglia, which limits their fidelity as a full microglial replacement, as has been rightfully pointed out in the discussion.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. While Adar1-KO macrophages do not engraft well, the success of TLR4-KO line highlights the model's potential for investigating other genes. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

      Thank you for this thoughtful and balanced assessment. The major suggestion from Reviewer 1 was that confirmation of RNAseq data with protein or functional studies would add strength.  We provided protein staining by IHC for IBA1 in vivo, as well as protein staining by FACS for CD11B, CD45, and TMEM119 in vitro and in vivo.  For TLR4, we showed successful protein KO and blunted response to LPS (a TLR4 ligand) challenge, which we believe provides some protein and functional data to support the approach.  To bolster these data, we added staining for P2RY12 on brain-engrafted ER-Hoxb8s.

      Regarding the Adar1 KO phenotypes showing non-engraftment. Because ADAR1 KO mice are embryonically lethal due to hematopoietic failure, we see the health impacts of Adar1 KO on ER-Hoxb8s as a strength of the transplantation model, enabling the assessment of ADAR1 global function in macrophages and microglia-like cells without generation of a transgenic mouse line. In addition, it was a surprise that the health impact occurs at the macrophage and not the progenitor stage, perhaps providing insight for future studies of ADAR1’s role in hematopoiesis. Instead, we were able to show a significant impact of complete loss of Adar1 on survival and engraftment, suggesting an important biological function of ADAR1. Macrophage-specific D1113H mutation, which affects part of the deaminase domain, shows that when the RNA deamination (but not the RNA binding) function of ADAR1 is disrupted, we find brain-wide interferonopathy. This is very exciting to our group and hopefully the community as astrocytes are thought to be a major driver of brain interferonopathy in patients with ADAR1 mutations. Instead, this suggests that disruption of brain macrophages is also a major contributor. 

      Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gained traction in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provided evidence to support: (1)

      Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to the Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competence and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an elegant study, demonstrating both the utility and limitations of ER-Hoxb8 technology as a surrogate model for microglia in vivo. The manuscript is well-designed and clearly written, but authors should consider the following suggestions:

      (1) Validation of RNA hits at the protein level: To strengthen the comparison between ER-Hoxb8 macrophages and WT bone marrow-derived macrophages, validating several RNA hits at the protein level would be beneficial. As many of these hits are surface markers, flow cytometry could be employed for confirmation (e.g., Figure 1D, Figure 3E).

      In vitro, we show protein levels by flow cytometry for CD11B (ITGAM) and CD45 (PTPRC; Figure 1C), as well as TMEM119 (Supplemental Figure 2A) and TLR4 (Supplemental Figure 3C/D). In vivo, we show TMEM119 protein levels by flow cytometry (Figure 3A), as well as their CD11B/CD45 pregates (Supplemental Figure 2C), plus immunostaining for IBA1 (AIF1; Figure 2D). We now provide additional data showing P2RY12 immunostaining in brain-engrafted cells (Supplemental Figure 2B). 

      (2) The authors should consider testing the phagocytic capacity of ER-Hoxb8-derived macrophages to further validate their functionality.

      Thank you for the suggestion. We measured ER-Hoxb8 macrophage ability to engulf phosphatidylserine-coated beads that mimic apoptotic cells, compared with phosphatidylcholine-coated beads, now as new Supplemental Figure 1C/D. This agrees with existing literature showing efficient engulfment/phagocytosis by ER-Hoxb8-derived cells (Elhag et al., 2021).

      (3) For Figure 3E, incorporating a wild-type (WT) microglia reference would be beneficial to establish a baseline for comparison (e.g. including WT microglia data in the graph or performing a ratio analysis against WT expression levels).

      We agree - we now include bars representing our sequenced primary microglia data in Figure 3E as a comparison.  

      (4) Some statistical analyses may require refinement. Specifically, for Figure 4J, where the effects of Adar1 KO and Adar1 KO with Bari are compared, it would be more appropriate to use a two-way ANOVA.

      Thank you for noting it. We have now done more appropriate two-way ANOVA and included the updated results in Figure 4J and the corresponding Supplemental Figure 4G. Errors in figure legend texts have also been corrected to reflect the statistical tests used.

      (5) Cx3cr1-creERT2 pups injected with tamoxifen: The authors could clarify the depletion ratio in these experiments before the engraftment and assess whether the depletion is global or regional. In comparison to Csf1r-/-, where TLR4-KO ER-Hoxb8 engraft globally, in Cx3cr1-creERT2, the engraftment seems more regional (Figure 5A vs Supplementary Figure 5B); is this due to the differences in depletion efficiency?

      This is an excellent question and observation, and one that we are very interested in, though that finding does not change the conclusions of this particular study.  We find some region-specific differences in depletion early after tamoxifen injection, but that all brain regions are >95% depleted by P7. For instance, in a recently published manuscript (Bastos et al., 2025) we find some differences in the depletion kinetics in the genetic model. By P3, we find 90% depletion in cortex with 50-60% in thalamus and hippocampus. In other studies, we typically deliver primary monocytes, and this is the first study where we report engraftment of ER-Hoxb8 cells in the inducible model.  In this sense, it is possible that depletion kinetics may regionally affect engraftment, but future studies are required to more finely assess this point with ER-Hoxb8s, as it may change how these models are used in the future.

      Bastos et al., Monocytes can efficiently replace all brain macrophages and fetal liver monocytes can generate bonafide SALL1+ microglia, Immunity (2025), https://doi.org/10.1016/j.immuni.2025.04.006

      (6) It would be helpful for the authors to clarify whether Adar1 is predominantly expressed by microglia, especially since the study aims to show its role in dampening the interferon response.

      That’s a wonderful point. Adar1 is expressed by all brain cells, with highest transcript level in some neurons, astrocytes, and oligodendrocytes. It is an interferon-stimulated gene, and mutation itself leads to interferonopathy, we believe, due to poor RNA editing and detection of endogenous RNA as non-self by MDA5. We hope it can dampen the interferon response, but in the case of mutation, Adar1 is probably causal of interferonopathy.  It is induced in microglia upon systemic inflammatory challenge (LPS). We have edited the text to highlight its expression pattern.  See BrainRNAseq.org (Zhang*, Chen*, Sloan*, et al., 2014 and Bennett et al., 2016)

      Reviewer #2 (Recommendations for the authors):

      (1) There appears to be a morphological difference between wt and Adar1/Ifih1 double KO (dKO) cells in the engrafted brains (Figure 5). It would be good if the authors could systematically compare the morphology (e.g., soma size, number, and length of branches) of the engrafted MLCs between the wt and mutant cells.

      We agree. While cells did not differ in branch number or length, engrafted dKO cells had significantly larger somas compared with controls, which we now present in Figure S5A.

      (2) To fully appreciate the extent of how those engrafted ER-Hoxb8 immortalized macrophages resemble primary, engrafted yolk sac-myeloid cells, vs engrafted iPSC-induced microglia, it would be informative to provide a comparison of their RNAseq data derived from the engrafted ER-Hoxb8 immortalized macrophages with published data transcriptomic data sets (e.g. Bennett et al. Neuron 2018; Chadarevian et al. Neuron 2024; Schafer et al. Cell 2023).

      Thank you for this suggestion. To address this, we provide our full dataset for additional experiments. To compare with a similar non-immortalized model, we compared top up- and down-regulated genes from our data to those of ICT yolk sac progenitor cells from our previous work (Bennett et al., 2018). We find overlap between brain-engrafted ER-Hoxb8-, bone marrow-, and yolk sac-derived cells (Supplemental Figure 2F, Supplemental Table 3).  

      Minor comments:

      Figure 6C: red arrow showing zoom in regions are not matchable. It might be beneficial to provide bigger images with each channel for C and D as a Supplemental Figure.

      We fixed this in Figure 6C to show areas of interest in the cortex for both conditions. Figure S7A shows intermediate power images to aid in interpretation.

    1. eLife Assessment

      This valuable work proposes a novel, rapid S. aureus entry mechanism via Ca²⁺-dependent lysosomal exocytosis and acid sphingomyelinase release, which influences bacterial sub-cellular fate. However, reliance on chemical inhibitors and the absence of a knockout phenotype weakens the overall impact, making the study incomplete.

    2. Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness. While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ?

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret. Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

    1. eLife Assessment

      The authors modified a common method to induce epilepsy in mice to provide an improved approach to screen new drugs for epilepsy. This is important because of the need to develop new drugs for patients who are refractory to current medications. The authors' method evokes seizures to circumvent a low rate of spontaneous seizures and the approach was validated using two common anti-seizure medications. The strength of evidence was solid, making the study invaluable, but there were some limitations to the approach and methods.

    2. Reviewer #1 (Public review):

      Summary:

      This important study by Chen et. al. describes a novel approach for optogentically evoking seizures in an etiologically relevant mouse model of epilepsy. The authors developed a model that can trigger seizures "on demand" using optogenetic stimulation of CA1 principal cells in mice rendered epileptic by an intra-hippocampal kainate (IHK) injection into CA3. The authors discuss their model in the context of the limitations of current animal models used in epilepsy drug development. In particular, their model addresses concerns regarding existing models where testing typically involves inducing acute seizures in healthy animals or waiting on infrequent, spontaneous seizures in epileptic animals.

      Strengths:

      A strength of this manuscript is that this approach may facilitate the evaluation of novel therapeutics since these evoked seizures, despite having some features that were significantly different from spontaneous seizures, are suggested to be sufficiently similar to spontaneous seizures which are more laborious to analyze. The data demonstrating the commonality of pharmacology and EEG features between evoked seizures and spontaneous seizures in epileptic mice, while also being different from evoked seizures in naïve mice, are convincing. The structural, functional, and behavioral differences between a seizure-naïve and epileptic mouse, which emerge due to the enduring changes occurring during epileptogenesis, are complex and important. Accordingly, this study highlights the importance of using mice that have underwent epileptogenesis as model organisms for testing novel therapeutics. Furthermore, this study positively impacts the wider epilepsy research community by investigating seizure semiology in these populations.

      Weaknesses:

      This study convincingly demonstrates that the feature space measurements for stimulus-evoked seizures in epileptic mice were significantly different from those in naïve mice; this result allows the authors to conclude that "seizures induced in chronically epileptic animals differed from those in naïve animals". However, the authors also conclude that "induced seizures resembled naturally occurring spontaneous seizures in epileptic animals" despite their own data demonstrating similar, albeit fewer, significant differences in feature space measurements. It is unclear if and what the threshold is whereby significant differences in these feature space measurements lead to the conclusion that the differences are meaningful, as in the comparison of epileptic and naïve mice, or not meaningful, as in the comparison of evoked and spontaneous seizures.

    3. Reviewer #2 (Public review):

      The authors aimed to develop an animal model of temporal lobe epilepsy (TLE) that will generate "on-demand" seizures and an improved platform to advance our ability to find new anti-seizure drugs (ASDs) for drug-resistant epilepsy (DRE). Unlike some of the work in this field, the authors are studying actual seizures, and hopefully events that are similar to actual epileptic seizures. To develop an optimized screening tool, however, one also needs high-throughput systems with actual seizures as a quantitative, rigorous, and reproducible outcome measures. The authors aim to provide such a model; however, this approach may be over-stated here and seems unlikely to address the critical issue of drug resistance, which is their most important claim.

      Strengths:

      - The authors have generated an animal model of "on demand" seizures, which could be used to screen new ASDs and potentially other therapies. The authors and their model make a good-faith effort to emulate the epileptic condition and to use seizure susceptibility or probability as a quantitative output measure.

      - The events considered to be seizures appear to be actual seizures, with some evidence that the seizures are different from seizures in the naïve brain. Their effort to determine how different ASDs raise seizure probability or threshold to an optogenetic stimulus to the CA1 area of the rodent hippocampus is focused on an important problem, as many if not most ASD screening uses surrogate measures that may not be as well linked to actual epileptic seizures.

      - Another concern is their stimulation of dorsal hippocampus, while ventral hippocampus would seem more appropriate.

      - Use of optogenetic techniques allows specific stimulation of the targeted CA1 pyramidal cells, and it appears that this approach is reproducible and reliable with quantitative rigor.

      - The authors have taken on a critically important problem, and have made a good-faith effort to address many of the technical concerns raised in the reviews, but the underlying problem of DRE remains.

      Weaknesses:

      - Although the model has potential advantages, it also has disadvantages. As stated by the authors, the pre-test work-load to prepare the model may not be worth the apparent advantages. And most important, the paper frequently mentions DRE but does not directly address it, and yet drug resistance is the critical issue in this field.

      - Although the paper shows examples of actual seizures, there remains some concern that some of the events might not be seizures - or a homogeneous population of seizures. More quantitative assessment of the electrical properties (e.g., duration) of the seizures and their probability is likely to be more useful than the proposed quantification in the future of the behavioral seizure stages, because the former could be both more objective and automated, while the behavioral analysis of the seizures will likely be more subjective and less reliable (and also fraught with subjectivity and analytical problems). Nonetheless, the authors point that the presence of "Racine 3 or above" behavioral seizures (in addition to their electrical data) is a good argument that many (if not all) of the "seizures" are actual epileptic seizures.

      - Optogenetic stimulation of CA1 provides cell-specificity for the stimulation, but it is not clear that this method would actually be better than electrical stimulation of a kindled rodent with superimposed hippocampal injury. The reader is unfortunately left with the concern of whether this model would be easier and more efficacious than kindling.

      - Although the authors have taken on a critically important problem, and have combined a variety of technologies, this approach may facilitate more rapid screening of ASDs against actual seizures (beneficial), but it does not really address the fundamentally critical yet difficult problem of DRE. A critical issue for DRE that is not well-addressed relates to adverse effects, which is often why many ASDs are not well tolerated by many patients (e.g., LEV). Thus, we are left with: how does this address anti-seizure DRE?

      - The focus of this paper seems to be more on seizures more than on epilepsy. In the absence of seizure spontaneity, the work seems to primarily address the issues of seizure spread and duration. Although this is useful, it does not seem to be addressing the question of what trips the system to generate a seizure.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      - The authors seem to have developed a new and useful model; however, it is not clear how this will address that core problem of DRE, which was their stated aim.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      - As stated before in the original review, the potential impact would primarily be aimed at the ETSP or a drug-testing CRO; however, much more work will be required to convince the epilepsy community that this approach will actually identify new ASDs for DRE. The approach is potentially time-consuming with a steep and potentially difficult optimization curve, and thus may not be readily adaptable to the typical epilepsy-models neuroscience laboratory.

      Any additional context you think would help readers interpret or understand the significance of the work:

      - The problem of DRE is much more complicated than described by the authors here; however, the paper could end up being more useful than is currently apparent. Although this work could be seen as technically - and maybe conceptually - elegant and a technical tour de force, will it "deliver on the promise"? Is it better than kindling for DRE? In attempting to improve the discovery process, how will the model move us to another level? Will this model really be any better than others, such as kindling?

    4. Reviewer #3 (Public review):

      This revised paper develops and characterizes a new approach for screening drugs for epilepsy. The idea is to increase the ability to study seizures in animals with epilepsy because most animal models have rare seizures. Thus, the authors use the existing intrahippocampal kainic acid (IHKA) mouse model, which can have very unpredictable seizures with long periods of time between seizures. This approach is of clear utility to researchers who may need to observe many seizure events per mouse during screening of antiseizure medications. A key strength is also that more utility can be derived from each individual mouse. The authors modified the IHKA model to inject KA into CA3 instead of CA1 in order to preserve the CA1 pyramidal cells that they will later stimulate. To express the excitatory opsin channelrhodopsin (ChR2) in area CA1, they use a virus that expresses ChR2 in cells that express the Thy-1 promoter. The authors demonstrate that CA3 delivery of KA can induce a very similar chronic epilepsy phenotype to the injection of KA in CA1 and show that optical excitation of CA1 can reliably induce seizures. The authors evaluate the impact of repeated stimulation on the reliability of seizure induction and show that seizures can be reliably induced by CA1 stimulation, at least for the short term (up to 16 days). These are strengths of the study.

      However, there are several limitations: the seizures are evoked, not spontaneous. It is not clear how induced seizures can be used to investigate if antiseizure medication can reduce spontaneous seizures. Although seizure inducibility and severity can be assessed, the lack of spontaneous seizures is a limitation. To their credit, the authors show that electrophysiological signatures of induced vs spontaneous seizures are similar in many ways, but the authors also show several differences. Notably, the induced seizures are robustly inhibited by the antiseizure medication levetiracetam and variably but significantly inhibited by diazepam, similar to many mouse models with chronic recurrent seizure activity. One also wonders if using a mouse model with numerous seizures (such as the pilocarpine model) might be more efficient than using a modified IHKA protocol.

      In this revised manuscript, the authors address some previous concerns related to definitions of seizures and events that are trains of spikes, sex as a biological variable, and present new images of ChR2 expression (but these images could be improved to see the cells more clearly). A few key concerns remain unaddressed, however. For example, it is still not clear that evoked seizures triggered by stimulating CA1 are similar to spontaneous seizures, regardless of the idea that CA1 plays a role in seizure disorders. It also remains unclear whether repeated activation of the hippocampal circuit will result in additional alterations to this circuit that affect the seizure phenotype over prolonged intervals (after 16 days). Furthermore, the use of SVM with the number of seizures being used as replicates (instead of number of mice) is inappropriate. Another theoretical concern is whether the authors are correct in suggesting that one will be able to re-use the mice for screening multiple drugs in a row.

      Strengths:<br /> - The authors show that the IHKA model of chronic epilepsy can be modified to preserve CA1 pyramidal cells, allowing optogenetic stimulation of CA1 to trigger seizures.<br /> - The authors show that repeated optogenetic stimulation of CA1 in untreated mice can promote kindling and induce seizures, indeed generating two mouse models in total.<br /> - Many electrophysiological signatures are similar between the induced and spontaneous seizures, and induced seizures reliably respond to treatment with antiseizure medications.<br /> - Given that more seizures can be observed per mouse using on-demand optogenetics, this model enhances the utility of each individual mouse.<br /> - Mice of each sex were used.

      Weaknesses:<br /> - Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently justified when using number of seizures as the statistical replicate (vs mice).<br /> - Related to the first concern, the utility of increasing number of seizures for enhancing statistical power is limited because standard practice is for sample size to be numbers of mice.<br /> - The term "seizure burden" usually refers to the number of spontaneous seizures per day, not the severity of the seizures themselves. Because the authors are evoking the seizures being studied, this study design precludes assessment of seizure burden.<br /> - It seems likely that repeatedly inducing seizures will have a long-term effect, especially in light of the downward slope at day 13-16 for induced seizures seen in Figure 4C. A duration of evaluation that is longer than 16 days is warranted.<br /> - Human epilepsy is extensively heterogeneous in both etiology and individual phenotype, and it may be hard to generalize the approach.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Weaknesses:

      While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant. Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.

      In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We made text edits to clarify the use of the linear mixed effect model. (page6, second paragraph and page 11, first paragraph)

      Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening also is a key advantage of our induced seizure model.  

      Reviewer 1 (Recommendations for the authors):

      (1) Address why the EEG data comparisons were performed between seizures and not between animals (as explicitly described in the public review). Further, a discussion of the biological significance (or lack thereof) of the effect size differences observed is warranted. This is especially concerning when the authors make the claim that spontaneous and induced seizures are essentially the same while their analysis shows all evaluated feature space parameters were significantly difference in the initial 1/3 of the EEG waveforms.

      We made text edits to clarify the use of the linear mixed effects model (page 6, second paragraph, and page 11, first paragraph)

      (2) The authors place great emphasis on the use of clinically/etiologically relevant epilepsy models in drug discovery research. There is discussion criticizing the time points required to enact kindling and the artificial nature of acute seizure induction methods. However, the combination IHK-opto seizure induction model also requires a lengthy timeline. A more tempered discussion of this novel model's strengths may benefit readers.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16.

      (3) The authors should further emphasize the benefit of having an inducible seizure model of focal epilepsy since other mouse models (e.g., genetic or TBI models) may have superior etiological relevance (construct and face validity) but may not be amenable to their optogenetic stimulation approach.

      Thank you for the suggestion. We revised the manuscript to better emphasize the potential significance of our approach. We added a discussion in the 'Application of Models...' section on page 15, second paragraph. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation.

      (4) Suggestion: Provide immunolabeled imagery demonstrating ChR2 presence in Thy1 cells.

      Thank you for the suggestion. We added a fluorescence image showing ChR2 expression in Fig. 2A

      (5) It might be prudent to mention any potential effects of laser heat on hippocampal cell damage, although the 10 Hz, ~10 mW, and 6 s stim is unlikely to cause any substantial burns. Without knowing the diameter and material of the optic fiber, this is left up to some interpretation.

      Thank you for the comments. In the Methods section, we listed the optical fiber diameter as 400 microns (page 17, EEG and Fiber Implantation section). Using 5–18 mW laser power with a relatively large fiber diameter of 400 microns, the power density falls within the range of commonly employed channelrhodopsin activation conditions in vivo. That said, we would like to investigate potential heat effects or cell damage in a follow-up study.

      (6) There are instances in the manuscript where the authors describe experimental and analytical parameters vaguely (e.g. "Seizures were induced several times a day", "stimulation was performed every 1 - 3 hours over many days"). These descriptions can and should be more precise.

      Thank you for the comments. To enhance clarity, we added the stimulation protocol in a flowchart format in Fig. S2A, describing how we determined the threshold and proceeded to the drug test. Following this protocol, there was variability in the number of stimulations per day.

      (7) In the second to last paragraph of the discussion, the authors state "However, HPDs are not generalizable across species - they are specific to the mouse model (55)." This statement is inaccurate. The paper cited comes from Dr. Corrine Roucard's lab at Synapcell. In fact, Dr. Rouchard argues the opposite (See Neurochem Res (2017) 42:1919-1925).

      Thank you for pointing out the mistake. On page 16, in the first paragraph, reference 55 (now 58 in the revised version) was intended to refer to 'quickly produce dose-response curves with high confidence.' In the revision, we cited another paper reporting that hippocampal spikes were not reproduced in the rat IHK model. R. Klee, C. Brandt, K. Töllner, W. Löscher, Various modifications of the intrahippocampal kainate model of mesial temporal lobe epilepsy in rats fail to resolve the marked rat-to-mouse differences in type and frequency of spontaneous seizures in this model. Epilepsy Behav. 68, 129–140 (2017).

      (8) In the discussion, Levetiracetam is highlighted as an ASM that would not be detected in acute induced seizure models; the authors point out its lack of effect in MES and PTZ. However, LEV is effective in the 6Hz test (also an acute-induced seizure model). This should be stated.

      Thank you for the comments. We highlighted the discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (9) The results text indicates that 9 epileptic mice were used to test LEV and DZP. However, the individual data points illustrated in Figure 5B show N=8 mice. Please correct.

      Thank you for the comments. A total of nine epileptic mice were used to assess two drugs, with the animals being re-used as indicated in the schematic. A total of eight assessments were conducted for DZP with six mice and eight assessments for LEV with five mice. Each assessment included hourly ChR2 activations without an ASM and hourly ChR2 activations after ASM injection.

      (10) Figure 4D: Naïve mice are labeled as solid blue circles in the legend while the data points are solid blue triangles. Please correct.

      Thank you. We corrected the marker in Fig.4D.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs. This seems to be a problem in other areas of the paper, also.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9, which shows behavioral seizure severity scores observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (2) The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.

      Thank you for the comments. IHK-injected mice had spontaneous tonic-clonic seizures before the start of optical stimulation, as shown in Figure S1.

      (3) The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.

      We appreciate the reviewer’s insights. We added a discussion comparing our model with other existing models in the Discussion section (pages 15 and 16, 'Comparison to Other Seizure Models Used in Pharmacologic Screening' section). In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      (4) The outcome measure for testing LEV and DZP on seizures was essentially the fraction of unsuccessful or successful activations of seizures, where high ASD efficacy is based on showing that the optogenetic stimulation causes fewer seizures when the drug is present. The final outcome measure is thus a percentage, which would still lead to a large number of tests to be assured of adequate statistical power. Thus, there is a concern about whether this proposed approach will have high enough resolution to be more useful than conventional screening methods so that one can obtain actual dose-response data on ASDs.

      Thank you for the comments. In this revision, we added Supplemental Figure S9, showing the severity of behavioral seizures observed before and during ASM testing for each animal. We observed a reduction in behavioral seizure severity for each subject. We would like to explore using behavioral severity as an outcome measure in a follow-up study.

      (5) The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.

      We appreciate the reviewer’s insights. We revised the manuscript to better emphasize the potential significance of our approach (page 15, second paragraph). The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents a critical goal. However, we believe that a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript, and we would like to explore it in a follow-up study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors should explain why 10 Hz was chosen as the stimulation frequency.

      Thank you for the comment. A frequency of 10 Hz was determined based on previous work using anesthetized animals prepared in an acute in vivo setting. To simplify the paper and avoid confusion, we did not include a discussion on how we determined the frequency. Instead, we added a detailed description of how we optimized the power in a flowchart format in Supplemental Figure S2. We hope this improves reproducibility.

      (2) After micro-injection of KA, morphological changes were observed in the hippocampus, but no comparison of Chr2 expression was made in naïve animals vs KA-injected animals. Presumably, the Thy1-Chr2 mouse expresses GFP in cells that express Chr2. Thus, it may be useful to show the expression of Chr2 in animals with hippocampal sclerosis. This may explain the lack of dramatic difference between stimulation parameters in naïve vs epileptic animals, as shown in supplemental Figure S2.

      Thank you for the suggestion. We added a fluorescence image of ChR2 expression in CA1, ipsilateral to the KA-injected site, in Fig. 2A.

      (3) The authors state that "During epileptogenesis, neural networks in the brain undergo various changes ranging from modification of membrane receptors to the formation of new synapses" and that these changes are critical for successful "on-demand" seizure induction. However, it is not clear or well-discussed whether changes in neuronal cell densities that occur during sclerosis are important for "on-demand" seizure induction as well. Also, the authors showed that naïve animals exhibit a kindling-like effect, but it was unclear whether a similar effect was present in epileptic animals (i.e. do stimulation thresholds to seizure induction change as the animal gets more induction stimulations)? If present, would the secondary kindling affect drug-testing studies (e.g., would the drug effect be different on induced seizure #2 vs induced seizure #20)?

      Thank you for the suggestion. Since this is an important aspect of the model, we would like to address the kindling effect, the secondary kindling effect, and histopathology in a longer-term setting (several weeks) in a follow-up study.

      (4) The authors show that in their model, LEV and DZP were both efficacious. The authors do not seem to mention that, over 25 years ago, LEV was originally missed in the standard ETSP screens; and, it was only discovered outside of the ETSP with the kindling model. The kindling model is now used to screen ASDs. The authors should consider adding this point to the Discussion. It remains unclear, however, if the author's screening strategy shows advantages over kindling and other such approaches in the field.

      Thank you for the suggestion. We added a discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (5) P8 paragraph 2. The authors state values for naïve animals, but they should also provide values for epileptic animals since they state that the groups were not significantly different (p>0.05). It would be useful to show values for both and state the actual p-value from the test. This issue of stating mean/median values with SD and sample size should be addressed for all data throughout the paper. Additionally, Figure S2 should be added to the manuscript and discussed, as it has data that may be valuable for the reproducibility of the paper.

      Thank you for the suggestion. Figure S2 shows the threshold power required to induce electrographic activity for n = 10 epileptic animals (9.14 ± 4.75 mW) and n = 6 naïve animals (6.17 ± 1.58 mW) (Wilcoxon rank-sum test, p = 0.137). The threshold duration was comparable between the same epileptic animals (6.30 ± 1.64 s) and naïve animals (5.67 ± 1.03 s) (Wilcoxon rank-sum test, p = 0.7133). 

      (6) In addition to the other stated references on synaptic reorganization in the CA1 area, the authors should mention similar studies from Esclapez et al. (1999, J Comp Neurol).

      Thank you. We have included the reference in the revision.

      (7) All of the raw EEG data on the seizures should be accessible to the readers.

      Thank you for the suggestion. We will consider depositing EEG data in a publicly accessible site.

      Reviewer 3 (Public review):

      Weaknesses:

      (1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      Thank you for the comment. We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9 to show the types of seizures observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.

      In this work, we used a linear mixed-effects model to address two levels of variability—between animals and within animals. The interactive linear mixed-effects model shows that most (~90%) of the variability in our data comes from within animals (residual), the random effect that the model accounts for, rather than between animals. Since variability between animals is low, the model identifies common changes in seizure propagation across animals while accounting for the variability in seizures within each animal. Therefore, the results we find reflect changes that occur across animals, not individual seizures. We made text edits to clarify the use of the linear mixed-effects model.

      (4) Seizure burden is not easily tested.

      Thank you for the comment. We added Supplemental Figure S9 to summarize the severity of behavioral seizures before and during ASM testing. This addresses the reviewer’s comment on seizure burden. In a follow-up study, we would like to explore this type of outcome measure for drug screening.

      Reviewer 3 (Recommendations for the authors):

      (1) Provide a stronger rationale to use area CA1. For example, the authors mention that CA1 is active during seizure activity, but can seizures originate from CA1? That would make the approach logical and also explain why induced and spontaneous seizures are similar.

      Thank you for the comment. We discussed it in the Discussion section (page 14, first and second paragraphs).

      (2) Explain the use of SVM classifiers so it is more convincing that induced and spontaneous seizures are similar. Or, if they are not similar, explain that this is a limitation.

      We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (3)If feasible, extend the duration over which seizure induction reliability is assessed so that the long-term utility of the model can be demonstrated.

      Thank you for the suggestion. We would like to assess long-term utility in a follow-up study.

      (4) The GitHub link is not yet active. The authors will be required to supply their relevant code for peer evaluation as well as publication.

      Thank you. The GitHub repository is now active.

      (5) State and assess the impacts of sex as a biological variable.

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females.

    1. eLife Assessment

      This useful manuscript reports on a new mouse model for LAMA2-MD, a rare but very severe congenital muscular dystrophy. The knockout mice were generated by removing exon3 in the Lama2 gene, which results in a frameshift in exon4 and a premature stop codon. These animals lack any laminin-alpha2 protein and confirm results from previous Lama2 knockout models. Additionally, this study includes weak transcriptomics data that might be a good resource for the field. However, experimental evidence, methods, and data analyses supporting the main claims of the manuscript are incomplete.

    2. Reviewer #1 (Public review):

      Strengths:

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Comments on revisions:

      This is the second revision of a paper focusing on the generation of a CRISPR/Cas9-engineered mouse model for LAMA2-MD. I have reviewed the initial submission, the first revision, and now this second revision. While there have been improvements, several issues still need to be addressed by the authors. I will outline these points without dividing them into major and minor categories:

      Introduction:

      The statement regarding existing mouse models requires correction: The claim, "They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies," is inaccurate. Current mouse models can indeed be used for testing gene therapy strategies, regardless of whether they contain elements in the Lama2 locus. The primary consideration is whether or not they express laminin-alpha2. Please revise this statement.<br /> Results Section:

      scRNA-seq:

      The authors note that they analyzed "a total of 8,111 cells from the dyH/dyH mouse brain and 8,127 cells from the WT mouse brain were captured using the 10X Genomics platform (Figure supplement 4A, B)." This is too few cells to support firm conclusions. Furthermore, there is a discrepancy in the referred figure S4, which indicates that 10,094 cells were analyzed for dyH/dyH mice and 10,496 for wild-type mice. Please correct this inconsistency.

      Figure 5C displays differences in cell populations between wild-type and dyH/dyH mice. Given the low number of cells analyzed and the lack of replicates, these differences cannot be considered reliable. More samples should be analyzed to support these findings.

      The data suggest a defect in the BBB for dyH/dyH mice, but this conclusion is based on minimal cell counts and remains purely correlative. If BBB issues exist, experimental validation is necessary, such as injecting dyes into the bloodstream to detect any leakage. I have previously highlighted this in my comments on earlier manuscript versions.

      Bulk RNA-seq:

      The number of samples analyzed here is substantial, making the data potentially more robust. These data could serve as a valuable resource for other researchers. However, it is important to note that all data are correlative and do not provide functional insights.

      Overall:

      The manuscript still lacks significant insights, partly because existing mouse models for LAMA2-MD have been extensively analyzed. While the bulk RNA-seq data offer some value as a resource, I recommend that the authors re-assess their writing and further temper their interpretations of the findings.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      Thank you for the valuable comments and good suggestions you have proposed, and we have added information and analysis of another mouse model for LAMA2-MD in the updated version 2 of this manuscript.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Thank you for the good comments you have proposed, and we have carefully corrected the overinterpretation and overstatements in the previous updated version.

      Unfortunately, the data on RNA-seq and scRNA-seq are still rather weak. scRNA-seq was conducted with only one mouse resulting in only 8000 nuclei. I am not convinced that the data allow us to interpret them to the extent of the authors. Similar to the first version, the authors infer function by examining expression. Although they are a bit more cautious, they still argue that the BBB is not functional in dy<sup>H</sup>/dy<sup>H</sup> mice without showing leakiness. Such experiments can be done using dyes, such as Evans-blue or Cadaverin. Hence, I would suggest that they formulate the text still more carefully.

      Thank you for the valuable suggestions. We also agree that we should perform more related functional experiments such as Evans-blue or Cadaverin to confirm the impaired BBB. However, the related functional experiments haven’t been done due to the first author has been working in clinic. While, we have added the "Limitations" part, and made statements in the Limitations part with "Even though RNA-seq and scRNA-seq have been performed, the data of scRNA-seq are still insufficient due to the limited number of mouse brains. This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed".

      A similar lack of evidence is true for the suggested cobblestone-like lissencephaly of the mice. There is no strong evidence that this is indeed occurring in the mice (might also be a problem because mice die early). Hence, the conclusions need to be formulated in such a way that readers understand that these are interpretations and not facts.

      Thank you for the valuable suggestions. We do agree with this comment, and have made statement in the Limitations with "This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed". Also, for the cobblestone-like lissencephaly which was showed in LAMA2-CMD patients while not found in the mouse model, we have added the discussion as "Though the cortical malformations were not found in the dy H/dy H brains by MRI analysis probably due to the small volume in within 1 month old, Thus, the changes in transcriptomes and protein levels provided potentially useful data for the hypothesis of the impaired gliovascular basal lamina of the BBB, which might be associated with occipital pachygyria in LAMA2-CMD patients."

      Finally, I am surprised that the only improvement in the main figures is the Western blot for laminin-alpha2. The histology of skeletal muscle still looks rather poor. I do not know what the problems are but suggest that the authors try to make sections from fresh-frozen tissue. I anticipate that the mice were eventually perfused with PFA before muscles were isolated. This often results in the big gaps in the sections.

      Thank you for the valuable suggestions. We do agree with this comment and we should make sections from fresh-frozen tissue. Therefore, we have made statement in the Limitations with "Moreover, due to making sections with PFA before muscles isolated, and not from fresh-frozen tissue, there have been big gaps in the sections which do affect the histology of skeletal muscle to some extent."

      Overall, the work is improved but still would need additional experiments to make it really an important addition to the literature in the LAMA-MD field.

      Thank you for all your good comments and the valuable suggestions.

      Reviewer #2 (Public Review):

      This revised manuscript describes the production of a mouse model for LAMA2- Related Muscular Dystrophy. The authors investigate changes in transcripts within the brain and blood barrier. The authors also investigate changes in the transcriptome associated with the muscle cytoskeleton. Strengths: (1) The authors produced a mouse model of LAMA2-CMD using CRISPR-Cas9. (2) The authors identify cellular changes that disrupted the blood-brain barrier.

      Thank you for your good comments.

      Weaknesses:

      The authors throughout the manuscript overstate "discoveries" which have been previously described, published and not appropriately cited.

      Thank you for your great suggestion. We have toned-down the interpretations and overstatements throughout the manuscript, and added words such as "potentially", "possible", "some potential clues", "was speculated to probably", and so on.

      Alternations in the blood brain barrier and in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published in the literature and are not cited appropriately.

      Thank you for your great suggestion. We do agree with that alternations in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published, and the related literatures have been cited in the updated version 2.0. However, alternations in the blood brain barrier in LAMA2-CMD haven’t been extensively studied, only some papers (such as PMID: 25392494, PMID: 32792907) have investigated or discussed this issue.

      The authors have increased animal number to N=6, but this is still insufficient based on Power analysis results in statistical errors and conclusions that may be incorrect.

      Thank you for your great suggestion. We do agree that the animal number should be increased for Power analysis, and we have added statements in the Limitations with "Finally, due to the limited number of animal samples for the Power analysis, the statistical errors and conclusions might be affected."

      The use of "novel mouse model" in the manuscript overstates the impact of the study.

      Thank you for your great suggestion. We have changed the statement "novel mouse model" throughout the manuscript except the title.

      All studies presented are descriptive and do not more to the field except for producing yet another mouse model of LAMA2-CMD and is the same as all the others produced.

      Thank you for your comment. We do agree that further functional experiments have not been performed to reveal and confirm the pathogenesis. However, the analysis of phenotype was systematic and comprehensive, including survival time, motor function, serum CK, muscle MRI, muscle histopathology in different stages, and brain histopathology. Moreover, RNA-seq and scRNA-seq in LAMA2-CMD have been seldom performed before, and the data in this study could provide potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD.

      Grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength, which is better achieved using ex vivo or in vivo muscle contractility studies.

      Thank you for your great suggestion. We do agree that grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength. And we have added related statement in the Limitations with "Grip strength measurements used in this study are considered error prone and do not give an accurate measurement of muscle strength, which would be better achieved using ex vivo or in vivo muscle contractility studies."

      A lack of blinded studies as pointed out of the authors is a concern for the scientific rigor of the study.

      Thank you for your great suggestion. We performed the studies with those scoring outcome measures not blinded to the groups. Actually, it was very easy to discriminate the dy<sup>H</sup>/dy<sup>H</sup> groups from the WT/Het mice due to that the dy<sup>H</sup>/dy<sup>H</sup> mice showed much smaller body shape than other groups from as early as P7 .

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      There are multiple grammatical errors throughout the manuscript which should be corrected.

      Thank you for your recommendation. We have carefully corrected the grammatical errors within the manuscript.

      The authors mention no changes in intestinal muscles, but it is unclear if they are referring to skeletal or smooth muscle.

      Thank you for your good comment. The intestinal muscles with no changes in this study are referring to smooth muscle, and we have changes the description into intestinal smooth muscles.

    1. eLife Assessment

      The authors present useful findings on the use of a single-fly behavioral paradigm for assessing different Drosophila genetic models of neurodegeneration. The experimental design and analyses are solid and can be used for quick behavioral assessment in fly models of various neurodegenerative diseases, especially those having an impact on locomotion. The work will be of interest to Drosophila biologists using behavior as a readout for their studies.

    2. Reviewer #1 (Public review):

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioural assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioural data is detailed, and the analysis parameters are well-explained.

      Weaknesses:

      The authors have yet to link cellular physiology to behaviour. It will be interesting to see how future use of this assay helps uncover connections between cellular pathology and behavioural changes.

    3. Reviewer #2 (Public review):

      The manifestation and progression of neurodegenerative disorders is poorly understood. Many of the neuronal disorders start by presenting subtle changes in neuronal circuit and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The present study very nicely uses the flies' behavioral response to predator-mimicking passing shadows to measure subtle changes in their behavior. The data from various fly genetic models of Parkinson's disease supports their claim. This single trial method is useful to capture the individual animal's response to the threatening stimuli but stops short of capturing the fine ambulatory responses which could provide further information on an individual's behavioral response. By capturing the fine features, the authors could get detailed observations, such as posture, gait or wing positioning for a better understanding the behavioral response to the passing shadow.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. eLife Assessment

      This is an important study that combines replications of findings and novel detailed MRI investigations to assess the impact of environmental enrichment and maternal behavior on mice brain structure at different stages of development. The results and evidence supporting the conclusions are convincing, but in detail, the interpretation is challenging, in particular due to inter-individual and inter-litter variability. The extent to which maternal care mediates the impact of enrichment on brain development during the perinatal period also remains unclear because behavior was observed only during short periods, and the performed analyses are still incomplete. This study will nevertheless be of significant interest to neuroscientists and researchers interested in neurodevelopment in relation to environmental factors because of its in-depth use of MRI to study brain plasticity in mice.

    2. Reviewer #1 (Public review):

      Kaller et al. (2025) explore the impact of environmental enrichment (EE) on the developing mouse brain, specifically during the perinatal period. The authors use high-resolution MRI to examine structural brain changes in neonates (postnatal day 7, P7) and compare these changes to those observed in adulthood. A key aspect of the study is the investigation of maternal care as a potential mediating factor in the effects of perinatal EE on neonatal brain development.

      The work exhibits the following notable strengths:

      (1) The study addresses a significant gap in the literature by investigating the effects of perinatal EE on whole-brain structure in neonates. Previous research has primarily focused on the effects of EE on the adult brain or specific aspects of early development, such as the visual system.

      (2) The authors employ a combination of high-resolution MRI and behavioral analysis of maternal care, providing a comprehensive view of the effects of EE.

      (3) The study reveals that EE affects brain structure as early as P7, with distinct regional changes compared to adulthood. The finding that maternal care influences neonatal brain structure and correlates with the effects of EE is particularly noteworthy.

      (4) The paper is clearly written, well-organized, and easy to follow. The figures and tables are informative and effectively illustrate the key findings.

      However, some weaknesses should be addressed to improve the quality of this study:

      (1) While the study includes an assessment of maternal care, the observational period is relatively short. A more extended or continuous assessment of maternal behavior could provide a more comprehensive understanding of its role in mediating the effects of EE.

      (2) The study primarily focuses on structural brain changes. Investigating the functional consequences of these changes could provide further insights into the long-term impact of perinatal EE.

      (3) The study demonstrates a correlation between maternal care and neonatal brain structure but does not elucidate the underlying mechanisms. Future studies could explore potential molecular or cellular mechanisms involved in these effects.

    3. Reviewer #2 (Public review):

      This paper by Kaller and colleagues combines an interesting replication of findings on the importance of maternal behavior on brain development in the offspring with a state-of-the-art MRI analysis and a novel comparison between such perinatal and early postnatal enrichment via the activity of the mother and a classical enriched environment in the adult. In general, the observations are as one would have expected. Early postnatal enrichment and adult enrichment have differential effects, which is plausible because, as the source of these changes is environmental, and environmental means very different things at these different stages. The three data sets presented are really interesting, and while the comparison between them might not always be as straightforward as it seems, the cross-sectional phenotyping with MRI already provides very important material and allows for interesting insight. Most interesting is possibly the massive effect of housing conditions at P7.

      In particular, the role of individual behavior differs. The authors highlight this role of the interaction with the environment, rather than the environment alone. Maternal care is a process that involves the pup.

      Importantly, the study shows that being born into an enriched environment predates certain changes that are still available after exposure at a later stage, but that there are also important differences. Detailed interpretation of these effects is not easy, however.

      Notably, the study does not include a condition of enrichment from birth into adulthood, and no analysis of the perinatal enrichment effects at an adult age. The timeline can be guessed from Figure 1b, but the authors might in places be more explicit about the fact that, indirectly and sometimes directly, animals of different ages (young adult versus adult) are compared. There is obviously no experience of maternal care in adulthood and no active exploration, etc in childhood. In part, this is what this paper is about, but it requires some thought for the reader to separate the more trivial from the more profound conclusions. Some more guidance would probably be welcome here. In general, Figure 4 is a great idea (and visually very appealing), but the content is not quite clear. "Adults born in EE vs. switched to EE in adulthood": this has, as far as I can tell, not been studied. What is compared are EE effects at two different time-points with two supposedly different mechanisms.

      From such a more mechanistic side, the authors might, for example, want to relate the observed patterns to what is known about the developmental (and plastic) dynamics in the respective brain regions at the given time. But age is a confounder here.

      There is another interesting point that the authors might discuss more prominently. The inter-individual differences in Z-score are dramatic within essentially all groups. So while the mean effects might still be statistically different, a large proportion of animals are within a range of values that could be found in either experimental group. The same is also true for the effects of maternal care, as depicted in Figure 3. While there is, for this ROI, a clear trend that overall relative volume decreases with maternal contact time at each time point, there is a large range of values for each maternal contact time bin. Consequently, neither genetics nor maternal care per se can be the driver of this variation. Part of it will be technical, but the trend in the data indicates that certainly not all of this is noise and technical error.

      This study has some open ends but also provides a very important and interesting direction for future study, corroborating the idea that behavior, maternal and own, does matter.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate the effect of environmental enrichment (EE) during the critical perinatal period on the developing brain structure and compare it with other periods. Different datasets of mice with EE or standard housing (SH) were compared with post-mortem MRI: dataset A (MRI at P96; 13 animals in EE during adulthood P53-P96, 14 animals in SH), dataset P (MRI at P43; 24 animals in EE during perinatal period and adulthood E17-P43, 25 animals in SH) and dataset N (MRI at P7; 52 animals in EE during perinatal period E13-P7, 67 animals in SH / resulting from 5 dams with 2 litters: 4 dams in EE and 6 dams in SH). The study replicated the effects observed during adulthood (main neuroanatomical EE/SH difference in datasets A and P: increase in the hippocampus volume) but also showed that volumetric changes for some regions differ between datasets A and P, suggesting different mechanisms of brain responses to enrichment depending on the period when EE was applied. Results on dataset N further showed that EE leads to lower brain size and differences for various regions: volume reduction in striatum, frontal, parietal, and occipital regions, hippocampus; volume increase for a few thalamic nuclei and hindbrain, suggesting different patterns of perinatal EE effects in datasets P and N. Since mice at P7 show little engagement with their environment, the authors further explored the hypothesis that the dams' behavior and interaction with neonates could be a mediator of brain differences observed at P7 between EE and SH animals. Maternal contact time was related to the P7 volumes for some regions (striatum, brainstem), but the variability and low sample size prevented a clear separation between EE and SH in terms of maternal behaviors.

      Strengths:

      (1) The question raised by this article is important at a fundamental level for our understanding of the complex interactions between the brain, behavior, and the environment.

      (2) This study replicates previous observations on the effects of EE in adult mice.

      (3) While some studies have been performed on neonates of dams exposed to EE during gestation, it is the first time that the effects of perinatal EE are investigated, in both the developing and mature brains with MRI. From a translational perspective, this is crucial for our understanding of human neurodevelopment in interaction with the environment.

      (4) The analyses carried out are numerous and detailed.

      Weaknesses:

      (1) The analyses carried out do not allow us to fully assess whether differences in maternal care mediate the effects of EE on brain structure during development. The observations support this causal hypothesis, but a complete mediation analysis would be useful if permitted by the sample size and the variability observed between litters.

      (2) The article is quite dense to read, given the number of analyses carried out. It is difficult at first reading to get a global view of the results. Figure 4 could be highlighted earlier to present the hypotheses and tests carried out.

      (3) The figures could be more explicit in terms of legends (particularly the supplementary figures).

    1. eLife Assessment

      This manuscript aims to identify the pacemaker cells in the lymphatic collecting vessels - the cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the exemplary use of existing approaches (genetic deletions and cytosolic calcium detection in multiple cell types), the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. The inclusion of scRNAseq and membrane potential data enhances a tremendous study. This fundamental discovery establishes a new standard for the field of lymphatic physiology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the multiple cell types present in the wall of murine collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction.

      Strengths:

      The experiments are rigorously performed, the data justify the conclusions and the limitations of the study are appropriately discussed.

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention.

      Comments on revisions: The authors have addressed all of the reviewer comments. They should be congratulated on their precise and comprehensive study.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine collecting lymphatics. Using state of the art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics.

      Strengths:

      The use of targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels.

      Weaknesses:

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. These weaknesses have been resolved by revision and addition of new and novel RNAseq data, additional colocalization data and membrane potential measurements.

      Comments on revisions: No additional concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogentic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels.

      Strengths:

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.

      Weaknesses:<br /> - More quantitative measurements.<br /> - Possible mechanisms associated with the pacemaker activity.<br /> - Membrane potential measurements.

      Comments on revisions: I do not have any additional comments.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have done an impressive job in responding to the previous critique and even gone beyond what was asked. I have only very minor comments on this excellent manuscript. The manuscript also needs some light editing for grammar and readability.

      We have worked to improve the grammar and readability of the manuscript.

      Comments:

      Lines 227-234: At what age was tamoxifen administered to the various CreERTM mice?

      We have updated the ages of the mice used in this study in the methods sections.

      UMAP in Figure 5A is missing label for cluster 19.

      The UMAP in Figure 5A has the label for cluster 19 at the center-bottom of the image.

      Supplement Figure 6: Cluster 10 seems to be separate from the other AdvC clusters, and it includes some expression of Myh11 and Notch3. Further, there is low expression of Pdgfra in this cluster, which can be seen in panel B and panels D-I. Are the Pdgfra negative cells in the pie charts from cluster 10? Could the cells in this cluster by more LMC like than AdvC like?

      We agree with the reviewer that the subcluster 10 of the fibroblasts cells are intriguing if only a minor population. When assessing just this population of cells, which is 77 cells out of 2261 total, 40 of the 77 were Pdgfra+ and of the 37 remaining Pdgfra- but 11 of those were still CD34+. Thus at least half of these cells could be expected to have the PdgfraCreERTM. Only 8 of the 37 were Pdgfra-Notch3+ while 12 cells were Pdgfra+Notch3+, and only 3 were Pdgfra-Myh11+ while 3 were Pdgfra+Myh11+. 26 of 77 cells were Pdgfra+Pdgfrb+ double positive, while 12 of 37 Pdgfra- cells were still Pdgfrb+. Additionally, within the 77 cells of subcluster 10 17 were positive for Scn3a (Nav1.3), 21were positive for Kcnj8 (Kir6.1), and 33 were positive for Cacna1c (Cacna1c) which are typically LMC markers would support the reviewers thinking that this group contains a fibroblast-LMC transitional cell type. Only 2 of 77 cells were positive for the BK subunit (Kcnma1), which is a classic smooth muscle marker. Another possibility is this population represents the Pdgfra+Pdgfrb+ valve interstitial cells we identified in our IF staining and in our reporter mice. Of note almost all cells in this cluster were Col3a1+ and Vim+. Even though we performed QC analysis to remove doublets, it is also possible some of these cells could represent doublets or contaminants, however the low % of Myh11 expression, a very highly expressed gene in LMCs especially compared to ion channels, would suggest this is less likely. Assessing the presence of this particular cell cluster in future RNAseq or with spatial transcriptomics will be enlightening.

      Line 360. Proofread section title.

      We have simplified this title to read “Optogenetic Stimulation of iCre-driven Channel Rhodopsin 2”

      Lines 370-371. Are the length units supposed to be microns or millimeters?

      We have corrected this to microns as was intended. Thank you for catching this error.

      The resolution for each UMAP analysis should be stated, particularly for the identification of subclusters. How was the resolution chosen?

      To select the optimal cluster resolution, we used Clustree with various resolutions. We examined the resulting tree to identify a resolution where the clusters were well-separated and biologically meaningful, ensuring minimal merging or splitting at higher resolutions. Our goal was to find a resolution that captures relevant cell subpopulations while maintaining distinct clusters without excessive fragmentation. We have now stated the resolution for the subclustering of the LECs, LMCs, and fibroblasts. We have also added greater detail regarding the total number of cells, QC analysis, and the marker identification criteria used to the methods sections. We used resolution of 0.5 for sub-clustering LMCs, 0.87 for LECs, and 1.0 for fibroblasts.  These details are now added to the manuscript.

    1. eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in terms of identifying mechanism as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

    2. Reviewer #2 (Public review):

      Summary:

      Sukhina et al. uses a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition to the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with an appropriate number of mice, robust phenotypes, and interesting conclusions, and the text is very well written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), should be studied in any future explorations using this model.

      The authors have recognized these limitations to the study in their discussion.

    3. Reviewer #3 (Public review):

      This communication from Sukhina et al argues that a period of malnutrition (modeled by caloric restriction) causes lasting immune deficiencies (myelopoesis) not rescued by re-feeding. This is a potentially important paper exploring the effects of malnutrition on immunity, which is a clinically important topic. The revised study adds some details with respect to kinetics of immune compartment and body weight changes, but most aspects raised by the referees were deferred experimentally. Several textual changes have been made to avoid over-interpreting their data. My overall assessment of this revised study is similar to my impression before, which is that while the observations are interesting, there is both a lack of mechanistic understanding of the phenomena and a lack of resolution/detail about the phenomena itself.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author Response:

      In the Weaknesses, Reviewer 3 suggests that in the Discussion, we comment upon whether WRN ATPase/3’-5’ helicase and WRNIP1 ATPase work on Y-family Pols additively or synergistically to raise fidelity. However, in the Discussion on page 20, we do comment on the role of WRN and WRNIP1 ATPase activities in conferring an additive increase in the fidelity of TLS by Y-family Pols.

    1. Author Response:

      We thank the reviewers for their thoughtful feedback and appreciate their recognition of the value of our findings. In response, we are refining the manuscript to clarify key terminology, more clearly describe our image analysis workflows, and temper the interpretation of our results where appropriate. We are planning to perform additional experiments to further investigate the specificity of mRNA co-localization between BK and CaV1.3 channels. We acknowledge the importance of understanding ensemble trafficking dynamics and the functional role of pre-assembly at the plasma membrane, and we plan to explore these questions in future work. We look forward to submitting a revised manuscript that addresses the reviewers’ comments in detail.

    2. eLife Assessment

      This valuable manuscript provides convincing evidence that BK and CaV1.3 channels can co-localize as ensembles early in the biosynthetic pathway, including in the ER and Golgi. The findings, supported by a range of imaging and proximity assays, offer insights into channel organization in both heterologous and endogenous systems. However, mechanistic questions remain unresolved, particularly regarding the specificity of mRNA co-localization, the dynamics of ensemble trafficking, and the functional significance of pre-assembly at the plasma membrane. While the data broadly support the central claims, certain conclusions would benefit from more restrained interpretation and additional clarification to enhance the manuscript's impact and rigor.

    3. Joint Public Review:

      This study presents a valuable contribution to our understanding of ion channel complex assembly by investigating whether BK and CaV1.3 channels begin to form functional associations early in the biosynthetic pathway, prior to reaching the plasma membrane. Using a combination of proximity ligation assays, single-molecule RNA imaging, and super-resolution microscopy, the authors provide convincing evidence that these channels co-localize intracellularly within the ER and Golgi, in both overexpression systems and a relevant endogenous cell model. The study addresses an important and underexplored aspect of membrane protein trafficking and organization, with broader implications for how ion channel signaling complexes are assembled and regulated. The experimental approaches are generally appropriate and the imaging data are clearly presented, with a commendable number of control experiments included. However, several limitations temper the interpretation of the results. The mechanisms underlying mRNA co-localization, and the role of co-translation in complex formation, remain insufficiently defined. Similarly, while intracellular colocalization is convincingly demonstrated, the study does not establish whether such early assembly is the predominant pathway for generating functional complexes at the plasma membrane. More rigorous quantification of channel co-association across compartments, and clarification of key terminology and image analysis methods, would strengthen the overall conclusions. Some of the language in the manuscript would also benefit from a more measured tone to avoid overstating the novelty of the findings. Despite these limitations, the study offers meaningful insights into intracellular ion channel organization and will be of interest to researchers in cell biology, membrane trafficking, and neurophysiology. With focused revisions addressing the outlined points, the manuscript has the potential to make a solid contribution to the field.

    1. eLife Assessment

      This important study explores the role of SIRT2 in regulating Japanese encephalitis virus replication and disease progression in rodent models. The findings presented are novel as sirtuins are known for their roles in aging, metabolism, and cell survival, but have not been studied in the context of viral infections until recently. The evidence supporting the claims is solid, although additional experiments to further characterize the clinical outcomes and directly test the link between acetylated NF-kB and SIRT2 expression would have strengthened the study. The work will be of interest to biologists studying viruses, sirtuins, and inflammation.

    2. Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      Strengths:

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

    3. Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

      We thank the reviewer for the valuable recommendation. We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      Furthermore, we acknowledge reviewers' comments that SIRT2 regulates systemic inflammatory responses and provides potent protection against viral infection. Additionally, NF-κB is not the only mediator of SIRT2's suppression of viral infection; other possible molecular mechanisms are also involved in this process.

      Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

      We thank the reviewer for the valuable recommendation. We are willing to measure NF-kB acetylation in AdSIRT2 JEV-infected cells compared to WT-infected cells, to verify that the acetylation of NF-kB is truly linked to SIRT2 expression levels as per the reviewers' suggestion.

      We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      We are accepting the reviewer's suggestion that AGK2 can also inhibit other Sirtuins. Thus, to test the contribution of other Sirtuins, the experiment could be repeated using wild-type and Sirt2 KO mice. We are willing to conduct the AGK2 experiment using JEV-infected wild-type and Sirt2 knockout mice.

    1. eLife Assessment

      This valuable study tested whether several months of dolutegravir intensification alters the size of the HIV reservoir as well as immune activation in individuals already on suppressive ART. While the general study approach is appropriate and the paper is well written, the evidence supporting the claims of the authors is incomplete. The title of the paper is only partially supported by the data, based on specific issues with the study design and analysis plan highlighted by Reviewer 1. Specifically, the primary study outcomes were not clearly described a priori, the plausibility of a biologic effect is uncertain based on lack of a consistent effect across participants, and sample size is small. Given a possible observed partial effect and relevant hypothesis, this approach warrants study in a larger trial.

    2. Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes betweenthe control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result. The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

    3. Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

    4. Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group. Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C). The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size. The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C. This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between group,s where the results are less convincing.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

    5. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. eLife Assessment

      Tropical single-island endemic bird populations are particularly vulnerable to climate change. The authors investigate genetic evidence of how such species dealt with climate changes in the past as a possible predictor for how they will respond to change in the future, which could provide an important example for the fields of conservation genetics and island biogeography. The authors' integration of genomics and habitat modeling is commendable, but we find that the support for their conclusions is incomplete: at times, the results presented appear to contradict each other, the authors do not fully account for key variables, and the limited taxonomic scope may cause problematic biases for the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors combine PSMC and habitat modeling to try to connect habitat change during the Last Glacial Period to changes in Ne.

      Strengths:

      Observing how tropical single-island endemic bird species responded to habitat change in the past may help inform conservation interventions for these particularly vulnerable species. The combination of genomics and habitat modeling is a good idea - this sort of interdisciplinary thinking is what is needed to tackle these complex questions. Additionally, the use of PSMC makes it possible to perform this analysis on poorly-studied species with only a single genome available.

      Room for Improvement:

      Why coalescent Ne is a better predictor of extinction risk than current genomic diversity, or current Ne, isn't explicitly explained. PSMC in particular has many caveats, and some are not acknowledged or adequately addressed by the authors. For example, the authors note that population structure is a confounding factor with PSMC, but that it is not a problem in this instance. They do not provide compelling evidence for why this would be the case, they simply state that the species studied are all single-island endemics. However, single-island endemic species are not necessarily panmictic; this is even less likely to be true for species studied here that inhabit a large geographic area (ie, Australian species). Differing PSMC parameters may also impact results: the differences between passerines and non-passerines were one of their main results, but they do not provide any analysis to show that this difference was not driven by the different mutation rates used for the two groups.

      Parameters for many steps are not described, and choices that are described (such as the PSMC parameters) are not always fully explained. It is unclear why all data was mapped to the autosomes rather than removing reads that map to the sex chromosomes first. Using all the data, the reads belonging to the sex chromosomes could potentially map to other areas of the genome. It does not seem like a mapping quality filter was used, so these potential spurious alignments would not have been removed prior to analysis.

      There are points where the results are described in ways that appear to potentially differ from the supplementary figures. The authors state that even for species where PSMC results differed between models, "trends of Ne increase or decrease from the LIG to LGM were robust across all three PSMC models considered." The figures in the supplement for Pachycephala philippinensis, Rhynochetos jubatus, and Zosterops hypoxanthus appear to potentially contradict this statement, but it is difficult to tell, as the time period observed is not clearly marked on the graphs. How this robustness of trends was determined is not explained, leaving the precision of the analysis unclear.

      Table 1 also includes some information that contradicts what is in the Supplementary Tables, leading to a lack of clarity. Centropus unirufus, Chaetorhynchus papuensis, and Cnemophilus loriae are not included in Supplementary Table 4. Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species. Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 says "NA" as if it was missing one of these analyses.

      Additionally, some of the results appear to contradict each other. For example, they show that there is no impact of habitat change in larger-bodied species, but also that larger-bodied species saw a decrease in Ne during the LGP. In another example, they state that when a species saw an increase in habitat during the LGP, they also had an increase in Ne. However, they also state that this was not the case for non-passerines.

      Ecosystems are highly complex; there may also be other variables influencing past demographic change other than those explored here. Results should be interpreted with caution.

    3. Reviewer #2 (Public review):

      Summary and strengths:

      In this manuscript, Karjee and colleagues used coalescent-based effective population size reconstruction (PSMC) from single genomes to understand past population trends in island birds and related this to life history traits and glacial patterns. This concept is fairly new, as there are still relatively few multiple PSMC synthesis studies. I also thought that the focus on island endemics was unique and adds value to this paper. I enjoyed seeing a paper focused on South East Asia and think that this could help contribute to our knowledge of the important biodiversity within this region.

      Major weaknesses:

      My biggest concern with this paper is that the analyses are limited to 20-30 species, and significant taxonomic bias is present (there are multiple species of passerine but only 1-2 representatives of other groups). While this is not an issue alone, many of the life history traits or geographical traits are conflated with phylogenetic diversity (e.g., there are no large-bodied passerines). Thus, it is my opinion that the impact of these drivers of past population size is conflated and cannot be disentangled with the current data. The authors themselves state that the core hypothesis surrounding Ne and habitat availability is not supported by their entire dataset (only seen in Passerines). This was not clear enough in the abstract, and conclusions cannot be drawn here as the impact of taxonomy cannot be separated from data richness, traits, etc. The PSMC analysis was done according to the most recent recommendations, and this part of the manuscript is fairly robust. However, in several places, it is incorrectly stated that the PSMC measures or can infer genetic diversity; PSMC only infers past effective population size. It cannot measure genetic diversity in the past. I cannot review the habitat reconstruction modelling as I am a conservation genomics specialist.

      Appraisal:

      I am not convinced about the findings within the paper. I do not think that the results are sufficiently supported at this time, largely due to the conflation of taxonomy with other variables. As this type of comparison is new, I do think that there is a chance for reasonable impact on the field of genomics and island biogeography if the manuscript's constraints are addressed. I do not see scope for impact on conservation at this time and find the conclusions in the abstract regarding conservation relevance to be unfounded.

    4. Author response:

      We thank the editors and the reviewers for their positive comments regarding our manuscript and the methodological approach we have taken to understand the historical demographic response of endemic island birds to climate change. We acknowledge the issues of uneven sample sizes and plan to include additional species of island endemic birds for which genomic data is now available. As requested by reviewer 1, we will also address the issues related to the PSMC analysis in the revised version of the manuscript.

    1. eLife Assessment

      This study presents important findings that enhance our understanding of immune cell interactions in the context of chronic HIV-1 infection. The evidence supporting the conclusions is convincing. The authors have employed appropriate and validated methodologies, including detailed data reprocessing and batch correction to account for inter-donor variability. The inclusion of supplementary figures and analyses, such as cell communication inference, further substantiates the robustness of the findings. Overall, this work contributes to our understanding of HIV-1 immune evasion and highlights potential therapeutic targets for reservoir eradication.

    2. Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      Comments on revisions:

      The authors justified my comments.

    3. Reviewer #3 (Public review):

      The revised manuscript demonstrates a marked improvement over the previous version. The authors have successfully incorporated feedback, and have moreover expanded their analyses.

      The Methods section is now more detailed and meets the requirements for reproducible research. Authors have reprocessed the data, creating an integrated dataset using a previously published single-cell RNA-Seq atlas, which includes both healthy donors and individuals with chronic HIV-1 infection. An additional batch correction step was included into the processing pipeline after the explicit analysis of inter-donor variability within immune subsets, as was suggested.

      Several supplementary figures were added, which both improve the understanding of data and address questions raised by the reviewers. The manuscript also provides additional analysis of cell communication inference, as suggested. The study of interactions between NK cells and infected CD4+ T cells, as well as between monocytes and infected CD4+ T cells, is valuable for understanding the influence of cell signaling on antiviral response and the production of HIV-1 transcripts in infected cells.

      The authors have addressed all the reviewers' suggestions, and the current version of the manuscript is both more comprehensive and more informative. Additional analysis has strengthened the narrative and the reproducibility of the research.

      The resulting manuscript is both more robust and more informative.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the molecular mechanisms underlying HIV-1 persistence and host immune dysfunction in CD4+ T cells during early infection (<6 months). Using single-cell multi-omics technologies-including scRNA-seq, scATAC-seq, and single-cell multiome analyses-they characterized the transcriptional and epigenomic landscapes of HIV-1-infected CD4+ T cells. They identified key transcription factors (TFs), signaling pathways, and T cell subtypes involved in HIV-1 persistence, particularly highlighting KLF2 and Th17 cells as critical regulators of immune suppression. The study provides new insights into immune dysregulation during early HIV-1 infection and reveals potential epigenetic regulatory mechanisms in HIV-1-infected T cells.

      Strengths:

      The study excels through its innovative integration of single-cell multi-omics technologies, enabling detailed analysis of gene regulatory networks in HIV-1-infected cells. Focusing on early infection stages, it fills a crucial knowledge gap in understanding initial immune responses and viral reservoir establishment. The identification of KLF2 as a key transcription factor and Th17 cells as major viral reservoirs, supported by comprehensive bioinformatics analyses, provides robust evidence for the study's conclusions. These findings have immediate clinical relevance by identifying potential therapeutic targets for HIV-1 reservoir eradication.

      We sincerely appreciate the reviewer’s positive evaluation of our work.

      Weaknesses:

      Despite its strengths, the study has several limitations. By focusing exclusively on CD4+ T cells, the study overlooks other relevant immune cells such as CD14+ monocytes, NK cells, and B cells. Additionally, while the authors generated their own single-cell datasets, they need to validate their findings using other publicly available single-cell data from HIV-1-infected PBMCs.

      Thank you to Reviewer #1 for your feedback on our work. In response to this feedback, we have examined cell-cell interactions between HIV-1-infected CD4+ T cells and other innate immune cells, including monocytes and NK cells. We identified altered interaction signaling patterns (e.g., MIF, ICAM2, CCL5, CLEC2B) that contribute to immune dysfunction and viral persistence (page 9, Supplementary Fig. 5) In addition, we validated the expression of KLF2 and its target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which includes both healthy donors and individuals with chronic HIV-1 infection. The upregulation of key KLF2 targets in HIV-1-infected CD4+ T cells from this dataset supports the reproducibility of our findings. We have incorporated into the revised Results, Discussion, and Supplementary Materials (page 8, page 12 and Supplementary Fig. 4A).

      Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      This key observation of up-regulation KLF2 associated genes family might be important in the HIV field for early diagnosis and viral clearance. However, with the limited sample size and in-vivo study model, it will be hard to conclude. I highly recommend increasing the sample size of early HIV-1-infected patients.

      Thank you to Reviewer #2 for this important comment. We acknowledge the limitations of our modest sample size, which reflects the challenges of recruiting well-characterized individuals in early HIV-1 infection (<6 months) and obtaining high-quality PBMCs for single-cell multi-omic profiling. To strengthen our findings, we validated the upregulation of KLF2 target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which showed similar expression patterns in HIV-1 RNA+ CD4+ T cells (page 8 and Supplementary Fig. 4A).

      Reviewer #3 (Public review):

      Summary:

      This manuscript studies intracellular changes and immune processes during early HIV-1 infection with an additional focus on the small CD4+ T cell subsets. The authors used single-cell omics to achieve high resolution of transcriptomic and epigenomic data on the infected cells which were verified by viral RNA expression. The results add to understanding of transcriptional regulation which may allow progression or HIV latency later in infected cells. The biosamples were derived from early HIV infection cases, providing particularly valuable data for the HIV research field.

      Strengths:

      The authors examined the heterogeneity of infected cells within CD4 T cell populations, identified a significant and unexpected difference between naive and effector CD4 T cells, and highlighted the differences in Th2 and Th17 cells. Multiple methods were used to show the role of the increased KLF2 factor in infected cells. This is a valuable finding of a new role for the major transcription factor in further disease progression and/or persistence.

      The methods employed by the authors are robust. Single-cell RNA-Seq from PBMC samples was followed by a comprehensive annotation of immune cell subsets, 16 in total. This manuscript presents to the scientific community a valuable multi-omics dataset of good quality, which could be further analyzed in the context of larger studies.

      We sincerely thank the reviewer for the insightful and concise summary of our work.

      Weaknesses:

      Methods and Supplementary materials

      Some technical aspects could be described in more detail. For example, it is unclear how the authors filtered out cells that did not pass quality control, such as doublets and cells with low transcript/UMI content. Next, in cell annotation, what is the variability in cell types between donors? This information is important to include in the supplementary materials, especially with such a small sample size. Without this, it is difficult to determine, whether the differences between subsets on transcriptomic level, viral RNA expression level, and chromatin assessment are observed due to cell type variations or individual patient-specific variations. For the DEG analysis, did the authors exclude the most variable genes?

      Thank you to Reviewer #3 for these detailed comments and observations. In the revised Methods section (page 16), we have added information on our quality control filtering process. Specifically, we excluded cells with fewer than 200 detected genes, high mitochondrial content (>30%), or low UMI counts. Doublets were identified and removed using DoubletFinder.

      To address inter-donor variability, we included a new supplementary figure (Supplementary Fig. 1B) showing the distribution of major immune cell types across individual donors. While we observed some variation in cell-type composition between individuals, this likely reflects natural biological heterogeneity in early HIV-1 infection. Additionally, we applied fastMNN batch correction to mitigate donor-specific technical variation. After correction, the overall patterns of gene expression within each major CD4+ T cell subset were consistent across individuals (Supplementary Fig. 1C).

      Regarding the DEG analysis, we used ‘FindMarkers’ function in Seurat (v.3.2.1), which does not exclude highly variable genes. These details have been clarified in the updated Methods section (page 18).

      The annotation of 16 cell types from PBMC samples is impressive and of good quality, however, not all cell types get attention for further analysis. It’s natural to focus primarily on the CD4 T cells according to the research objectives. The authors also study potential interactions between CD4 and CD8 T cells by cell communication inference. It would be interesting to ask additional questions for other underexplored immune cell subsets, such as: 1) Could viral RNA be detected in monocytes or macrophages during early infection? 2) What are the inferred interactions between NK cells and infected CD4 T cells, are interactions similar to CD4-CD8 results? 3) What are the inferred interactions between monocytes or macrophages and infected CD4 T cells?

      In line with our study objectives, we initially focused on CD4+ T cells as primary HIV-1 targets. However, in response to the reviewer’s comment, we examined the inferred communications between HIV-1-infected CD4+ T cells and other immune cells.

      (1) With regard to the presence of viral RNA in monocytes or macrophages, we observed negligible HIV-1 RNA signal in these cell types in our dataset, consistent with their low permissiveness in early-stage infection [2]. However, we acknowledge the limitations of detecting rare infected cells at the single-cell level.

      (2) We identified increased MIF and ICAM2 signaling between NK cells and HIV-1-infected CD4+ T cells, which are associated with KLF2-mediated immune modulation. These patterns are consistent with the CD4–CD8 interaction results observed in our dataset. (Supplementary Fig. 5A)

      (3) Through the cell-cell interaction analysis with differential expression analysis, we inferred reduced CCL5 and CD55 signaling between monocytes and HIV-1-infected CD4+ T cells (Supplementary Fig. 5B). These reductions may potentially impair immune responses and antiviral defense.

      We appreciate the reviewer’s suggestions and believe that the analysis of underexplored immune subsets strengthens the relevance of our findings. These results have been incorporated into the revised Results (page 9).

      Discussion

      It would be interesting to see more discussion of the observation of how naïve T cells produce more viral RNA compared to effector T cells. It seems counterintuitive according to general levels of transcriptional and translational activity in subsets.

      Another discussion block could be added regarding the results and conclusion comparison with Ashokkumar et al. paper published earlier in 2024 (10.1093/gpbjnl/qzae003). This earlier publication used both a cell line-based HIV infection model and primary infected CD4 T cells and identified certain transcription factors correlated with viral RNA expression.

      Thank you to Reviewer #3 for the insightful suggestions. We observed that the proportion of HIV-1-infected naïve CD4 T cells is higher compared to effector T cells. Although effector CD4 T cells are generally more active, previous studies have suggested that naïve CD4 T cells are susceptible to HIV-1 infection during early infection that may associate with initial expansion and rapid progression [3, 4]. This may be due to less restriction by antiviral signaling or more accessible chromatin states in resting cells. We have added this context and cited relevant papers to address this observation (page 11)

      In addition, we have incorporated a comparative discussion with the recent study [5], which identified FOXP1 and GATA3 as transcriptional regulators associated with HIV-1 RNA expression. While these TFs were not significantly differentially expressed in our dataset, we discuss potential reasons for this discrepancy—including differences in infection model (in vitro vs. ex vivo), infection stage (latency vs. acute), and T cell subset composition—and emphasize that both studies highlight the importance of transcriptional regulation in HIV-1 persistence (page 12 and Supplementary Fig. 4B).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study has several notable limitations.

      First, it was restricted to early-stage HIV-1 infection (<6 months) without longitudinal data, preventing the authors from capturing temporal changes in immune cell populations, gene expression profiles, and epigenetic landscapes throughout disease progression.

      Thank you to Reviewer #1 for this important limitation. As noted, our study focused exclusively on early-stage HIV-1 infection (<6 months) to capture the initial immune dysregulation and epigenetic alterations. We agree that longitudinal analysis would provide valuable insights into disease progression. However, due to the limited availability of early-infection patient samples suitable for performing multi-omics profiling, we prioritized capturing a detailed snapshot at this early stage. To address this limitation, future studies incorporating longitudinal sampling—including chronic infection and long-term non-progressors—will be essential to fully elucidate the temporal dynamics of HIV-1 pathogenesis.

      Second, while the bioinformatic analysis compared "Uninfected" and "HIV-1-infected" cells from patients, the authors could have strengthened their findings by incorporating publicly available single-cell data from healthy donors and chronically infected HIV-1 patients to validate their arguments across all figures.

      To support the robustness of our findings, we incorporated a publicly available single-cell RNA-seq dataset [1], which includes both healthy donors and individuals with chronic HIV-1 infection. In this dataset, we validated the upregulation of KLF2 and its target genes in HIV-1-infected CD4+ T cells and observed generally consistent expression patterns with those in our early-infection cohort (page 8; page 12 and Supplementary Fig. S4). While not all gene-level trends were identically reflecting differences in infection stage and immune activation status, this external comparison reinforces the reproducibility of key observations and highlights the unique transcriptional features associated with early HIV-1 infection.

      Third, although the study focused on CD4+ T cells as primary HIV-1 targets, it overlooked other important immune cells such as CD8+ T cells, monocytes, and NK cells, which may contribute to viral persistence and immune dysfunction through cell-cell interactions.

      In the revised manuscript, we expanded our analysis to include predicted ligand–receptor interactions between HIV-1-infected and uninfected CD4+ T cells with innate and cytotoxic immune cells using CellChat v.2.1.1. Specifically, we evaluated interactions with NK cells and monocytes and identified altered signaling pathways such as MIF, ICAM2, CCL5, and CLEC2B, which are associated with immune modulation (Supplementary Fig. 5A). We have added these results to the revised Results (page 9).

      Lastly, comparing these findings with other chronic viral infections (e.g., HBV, HCV) would have positioned this work more effectively within the broader field of viral immunology and enhanced its impact.

      We agree that broader comparisons with other chronic viral infections could enhance the impact of our findings. In the current discussion, we noted similarities in interferon signaling disruption with viruses such as HCV and HSV. (page 11). Our observation that HIV-1-infected CD4+ T cells exhibit impaired interferon responses is consistent with immune evasion mechanisms reported in HCV and HSV infections. These results underscore both the shared and specific features of immune modulation and persistence during HIV-1 early infection.

      Reviewer #3 (Recommendations for the authors):

      Supplementary Table S1 should indicate which technique was used for sequencing. However, the current version of the table marks no protocol applied to the majority of the samples, which is confusing and needs to be corrected.

      Thank you to Reviewer #3 for pointing out this important oversight. We have revised Supplementary Table S1 to clearly indicate the sequencing method used for each sample. Separate columns for scRNA-seq, scATAC-seq, and sc-Multiome now specify whether each technique was applied (“Yes” or “No”) to improve clarity and transparency.

      (1) Wang, S., et al., An atlas of immune cell exhaustion in HIV-infected individuals revealed by single-cell transcriptomics. Emerg Microbes Infect, 2020. 9(1): p. 2333-2347.

      (2) Arfi, V., et al., Characterization of the early steps of infection of primary blood monocytes by human immunodeficiency virus type 1. J Virol, 2008. 82(13): p. 6557-65.

      (3) Douek, D.C., et al., HIV preferentially infects HIV-specific CD4+ T cells. Nature, 2002. 417(6884): p. 95-8.

      (4) Jiao, Y., et al., Higher HIV DNA in CD4+ naive T-cells during acute HIV-1 infection in rapid progressors. Viral Immunol, 2014. 27(6): p. 316-8.

      (5) Ashokkumar, M., et al., Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation. Genomics Proteomics Bioinformatics, 2024. 22(1).

    1. eLife Assessment

      This study presents valuable findings on the relationship between nutrient availability and NAD/NADH levels, which in turn regulate biomass production in cancer cells. The authors provide solid evidence to support their claims, offering insight into why it is difficult to predict which nutrients limit cancer cell growth: both cell type and nutrient availability together determine the oxidative capacity that constrains the synthesis of various metabolic intermediates. The manuscript will be of interest to researchers working in cancer and cell metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how cellular NAD/NADH ratios are controlled in cancer cell lines in vitro. The authors build on previous work, which shows that serine synthesis is sensitive to NAD/NADH ratios and PHGDH expression. Here, the authors demonstrate that serine synthesis is variable across a panel of cell lines, even when controlling for expression of serine synthesis enzymes such as PHGDH. The authors show that cellular NAD/NADH ratios correlate with the ability to synthesize serine and grow in serine-deprived environments when PHGDH levels remain constant. Investigating this variability in NAD/NADH ratios, the authors find that the cells that can positively respond to serine deprivation are able to increase oxygen consumption and cellular NAD/NADH ratios. Cells that do not increase oxygen consumption in response to serine deprivation do not increase NAD/NADH ratios and cannot grow well without serine. The authors go on to show that in cells with the ability to increase oxygen consumption upon serine deprivation, PHGDH expression alone is sufficient to fully restore growth-serine; in cells that cannot increase oxygen consumption, both PHGDH expression and interventions to increase NAD/NADH ratios are required to increase growth. Thus, cells need both PHGDH and NAD/NADH increases to maximize serine synthesis in response to serine deprivation. The authors previously showed that lipid synthesis likewise requires NAD regeneration. Interestingly, one cell line that does not increase oxygen consumption in response to serine limitation tends to increase oxygen consumption in response to lipid deprivation; accordingly, depriving this cell line of lipids increases the synthesis of serine. Together, these findings show that how cells respond to nutrient deprivation is highly variable and that the response to nutrient deprivation (for example, whether or not oxygen consumption is increased) will determine how well cells tolerate depletion of nutrients with related biosynthetic constraints. This work sheds light on the complexity of cancer cell metabolism and helps to explain why it is difficult to predict which nutrients will be limiting to any cancer cell type or environment.

      Strengths:

      (1) The authors use multiple interventions to manipulate NAD/NADH ratios in cells.

      (2) Experiments are well controlled and appropriately interpreted.

      Weaknesses:

      Overall the data support the conclusions of the manuscript. I have only two minor comments and suggestions:.

      (1) Figure 2B/C: data are presented as relative to +serine, which shows how some cells respond to -serine, but may also be of interest to see how absolute (not relative) NAD/NADH levels correlate with serine synthesis and serine-independent proliferation. In other words, is it the dynamic increase in the ratio that is most important, or the absolute level of the ratio?

      (2) Line 177-178: the authors write, "We hypothesized that the elevated NAD+/NADH ratio represented a cellular response to make the NAD+/NADH ratio more oxidized to enable serine synthesis". I recommend modest edits to avoid anthropomorphizing. It is possible that the ratio responds for reasons yet to be determined and not necessarily because the cell is deliberately trying to enable serine synthesis.

    3. Reviewer #2 (Public review):

      In the manuscript "Cancer cells differentially modulate mitochondrial respiration to alter redox state and enable biomass synthesis in nutrient-limited environments", Chang et al investigate how cancer cells respond to the limitation of certain environmental nutrients by regulating the cellular NAD+/NADH ratio. They focus on serine and lipid metabolism, pathways known to be controlled by the NAD+/NADH ratio, and propose that changes in mitochondrial respiration in response to deprivation of these nutrients can influence the NAD+/NADH ratio, thereby impacting biomass synthesis.

      While the study is descriptive in nature and does not investigate specific molecular mechanisms that explain the crosstalk between nutrient availability and mitochondrial redox changes, the experimental component is robust, and the conclusions are well supported by the results. Some suggestions could further refine the conclusions and enhance the quality of the manuscript.

      Main critiques:

      (1) Throughout the manuscript, the authors utilise the number of cell doublings per day as an endpoint readout of cell proliferation. It would be advisable to include a quantification of the cell number and to display the proliferation rate over time. This would provide valuable insights into the timeline of cellular responses and avoid potential confounding effects associated with the use of Sulforhodamine B dye, an indirect measure of cell proliferation based on protein content, which may be influenced by some of the interventions. Furthermore, it will help determine whether specific treatments reduce cellular doublings resulting from cell death. This concern is particularly evident in treatments with rotenone, e.g., Fig. 1G, where the increase in doublings could be attributed to cell death.

      (2) The authors propose a model in which the deprivation of extracellular nutrients impacts mitochondrial respiration, which in turn increases the NAD+/NADH ratio and ultimately affects metabolic biosynthetic pathways that occur in the cytosol, such as serine biosynthesis. The mechanism by which nutrient availability is sensed and transmitted across different cellular compartments to regulate mitochondrial redox status remains unclear. This concern is particularly relevant for serine metabolism, as its synthesis occurs in the cytosol, but the authors connect it to mitochondrial respiration. Compartment-specific measurements of NAD+/NADH ratio would help to understand to what extent the redox state is affected by nutrients in the mitochondria and in the cytoplasm (see also minor critiques point 2). Moreover, the use of the genetic tool LbNox could be employed to manipulate the NAD+/NADH ratio in a compartment-specific manner, while also avoiding the toxicity of certain compounds, such as rotenone. This set of experiments would add depth to the investigation, which might otherwise appear too descriptive.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides new insights into how cancer cells adapt their metabolism under nutrient-deprived conditions. They find cells respond differentially to serine and lipid deprivation via oxidising the cell redox state, which enables biomass synthesis and cell proliferation. They identified mitochondrial respiration as the major mechanism that dictates the endogenous NAD+/NADH ratio. By incorporating a dual stress paradigm, serine and lipid deprivation, the study further suggests that the NAD+/NADH ratio can serve as a link to orchestrate the complex interplay between multiple nutrient changes in the tumour microenvironment.

      Strengths:

      A novel aspect of this study is the idea that cancer cells are not uniformly passive victims of nutrient limitation; some can actively invoke endogenous NAD+ regeneration to combat nutrient stress. The conclusion is well-supported by comparing multiple cell lines from different tissues and genetic backgrounds, which improves generalizability. While most of the smaller conclusions align with common reasoning and expectations, the step-by-step deduction that leads to a novel 'big picture' is commendable. Another notable strength is the integration of dual stress (lipid and serine deprivation), which better mimics the complex tumor microenvironment with multiple nutrient fluctuations, raising the translational potential of these findings. The observation that lipid-deprived cells can stimulate serine synthesis and support proliferation in a subset of cancer cell lines offers a novel perspective on metabolic plasticity under starvation conditions.

      Weaknesses:

      Although the authors derive a novel and valuable overarching concept, the presentation of this "big picture" is not clearly articulated, making it less accessible to readers outside the immediate field. It would greatly enhance the manuscript to include a clearer summary of the overarching model and its implications. Additionally, discussing the potential clinical significance and applications of the findings would increase the relevance and broader impact of the work. Finally, the manuscript's clarity and credibility are undermined by inconsistent figure labeling and the lack of statistical analysis, particularly for the Western blot data.

      While this study identifies changes in serine synthesis, mitochondrial respiration, PHGDH protein levels, and NAD+/NADH ratio in different cell lines, some of these relationships appear correlative rather than causally established (Figure 2; Figure 5; Figure 6). Some claims are thus overinterpreted. For example, the co-occurrence of increased NAD+/NADH ratio and citrate levels under lipid deprivation in A549 cells does not establish causality (Figure 5). Direct perturbation experiments that manipulate NAD+/NADH and assess downstream effects on citrate synthesis would substantially strengthen the conclusions.

      The study focuses predominantly on mitochondrial respiration as a source of NAD+ regeneration. However, it will also be interesting to check other significant pathways, such as NAD+ salvage, which have been implicated in supporting serine biosynthesis. In addition, the subcellular distribution of NAD+ may distinguish whether some cells are truly redox-unresponsive. Mitochondrial NAD+ regeneration might counteract the cytosolic NAD+ consumption, rendering a relatively stable intracellular NAD+/NADH ratio. The malate-aspartate shuttle can be an interesting aspect.

      The authors should acknowledge the limitations of short-term isotope tracing in their experimental design. Differences in metabolic rates across cell lines can affect the kinetics of metabolite labeling, limiting the direct comparability of metabolic fluxes between them. As a result, observed changes may reflect transient adaptations rather than stable metabolic reprogramming. It is important to clarify that the study primarily captures short-term responses, and the conclusions may not extrapolate to longer-term adaptations or protein-level changes under sustained nutrient stress.

    1. eLife Assessment

      Weiss et al. provide important new insights and convincing evidence to further our mechanistic understanding of how antigen presentation shapes skin persistence of CD8+ TRM. Using a mouse model for inducible genetic ablation of transforming growth factor beta receptor 3 (TGFBR3) in CD8+ T cells, they demonstrate TGFBR3's role in regulating CD8+ TRM persistence in skin. Furthermore, they show that the strength of T cell receptor (TCR) engagement upon initial CD8+ TRM skin seeding has a positive influence on subsequent TRM expansion following a secondary antigen-reencounter. Together, these mechanisms add to our understanding of how the skin CD8+ T cell repertoire is dynamically responsive to topical antigen.

    2. Reviewer #1 (Public review):

      Summary:

      Weiss et. al. seek to delineate the mechanisms by which antigen-specific CD8+ T cells outcompete bystanders in the epidermis when active TGF-b is limiting, resulting in selective retention of these cells and more complete differentiation into the TRM phenotype.

      Strengths:

      They begin by demonstrating that at tissue sites where cognate antigen was expressed, CD8+ T cells adopt a more mature TRM transcriptome than cells at tissue sites where cognate antigen was never expressed. By integrating their scRNA-Seq data on TRM with the much more comprehensive ImmGenT atlas, the authors provide a very useful resource for future studies in the field. Furthermore, they conclusively show that these "local antigen-experienced" TRM have increased proliferative capacity and that TCR avidity during TRM formation positively correlates with their future fitness. Finally, using an elegant experimental strategy, they establish that TCR signaling in CD8+ T cells in epidermis induces TGFBRIII expression, which likely contributes to endowing them with a competitive advantage over antigen-inexperienced TRM.

      Weaknesses:

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to dissect the mechanistic basis of their previously published finding that encountering cutaneous antigen augments the persistence of CD8+ memory T cells that enter skin (TRM) (Hirai et al., 2021, Immunity). Here they use the same murine model to study the fate of CD8+ T cells after antigen-priming in the lymph nodes, (1) those that re-encounter antigen in the skin via vaccinia virus (VV) versus (2) those that do not encounter antigen in skin but rather are recruited via topical dinitrofluorobenzene (DNFB) (so-called "bystander TRM"). The authors' previous publication establishes that this first group of CD8+ TRM has a persistence advantage over bystander TRM under TGFb-limiting conditions. The current paper advances this finding by elucidating the role of TGFBR3 in regulating CD8+ TRM skin persistence upon topical antigen exposure. Key novelty of the work lies in the generation and use of the CD8+ T cell-specific TGFBR3 knockout model, which allows them to demonstrate the role of TGFBR3 in fine-tuning the degree of CD8+ T cell skin persistence and that TGFBR3 expression is promoted by CD8+ TRM encountering their cognate antigen upon initial skin entry. Future work directly measuring active TGFb in the skin under different conditions would help identify physiologic scenarios that yield active TGFb-limiting conditions, thus establishing physiologic relevance.

      Strengths:

      Technical strengths of the paper include (1) complementary imaging and flow cytometry analyses, (2) integration of their scRNA-seq data with the existing CD8+ TRM literature via pathway analysis, and (3) use of orthogonal models where possible. Using a vaccina virus (VV) model, with and without ovalbumin (OVA), the authors investigate how topical antigen exposure and TCR strength regulate CD8+ TRM skin recruitment and retention. The authors use both FTY720 and a Thy1.1 depleting antibody to demonstrate that skin CD8+ TRM expand locally following both a primary and secondary recall response to topical OVA application.

      A conceptual strength of the paper is the authors' observation that TCR signal strength upon initial TRM tissue entry helps regulate the extent of their local re-expansion on subsequent antigen re-exposure. They achieved this by applying peptides of varying affinity for the OT-I TCR on the DNFB-exposed flank in tandem with initial VV-OVA + DNFB treatment. They then measured TRM expansion after OVA peptide rechallenge, revealing that encountering a higher-affinity peptide upon skin entry leads to greater subsequent re-expansion. Additionally, by generating an OT-I Thy1.1+ E8i-creERT2 huNGFR Tgfbr3fl/fl (Tgfbr3∆CD8) mouse, the authors were able to elucidate a unique role for TGFBR3 in CD8+TRM persistence when active TGFb in skin is limited.

      Weaknesses:

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure. The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data. Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 1. In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F). Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model.

    1. eLife Assessment

      This study presents a useful framework to extract the individuality index to predict subjects' behavior in the target tasks. However, the evidence supporting such a framework is somewhat incomplete and would benefit from overall framing and clarity on its approaches. Overall, this study would be of interest to cognitive and AI researchers who work on cognitive models in general.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript presents EIDT, a framework that extracts an "individuality index" from a source task to predict a participant's behaviour in a related target task under different conditions. However, the evidence that it truly enables cross-task individuality transfer is not convincing.

      Strengths

      The EIDT framework is clearly explained, and the experimental design and results are generally well-described. The performance of the proposed method is tested on two distinct paradigms: a Markov Decision Process (MDP) task (comparing 2-step and 3-step versions) and a handwritten digit recognition (MNIST) task under various conditions of difficulty and speed pressure. The results indicate that the EIDT framework generally achieved lower prediction error compared to baseline models and that it was better at predicting a specific individual's behaviour when using their own individuality index compared to using indices from others.

      Furthermore, the individuality index appeared to form distinct clusters for different individuals, and the framework was better at predicting a specific individual's behaviour when using their own derived index compared to using indices from other individuals.

      Weaknesses

      (1) Because the "source" and "target" tasks are merely parameter variations of the same paradigm, it is unclear whether EIDT achieves true cross-task transfer. The manuscript provides no measure of how consistent each participant's behaviour is across these variants (e.g., two- vs three-step MDP; easy vs difficult MNIST). Without this measure, the transfer results are hard to interpret. In fact, Figure 5 shows a notable drop in accuracy when transferring between the easy and difficult MNIST conditions, compared to transfers between accuracy-focused and speed-focused conditions. Does this discrepancy simply reflect larger within-participant behavioural differences between the easy and difficult settings? A direct analysis of intra-individual similarity for each task pair - and how that similarity is related to EIDT's transfer performance - is needed.

      (2) Related to the previous comment, the individuality index is central to the framework, yet remains hard to interpret. It shows much greater within-participant variability in the MNIST experiment (Figure S1) than in the MDP experiment (Figure 3). Is such a difference meaningful? It is hard to know whether it reflects noisier data, greater behavioural flexibility, or limitations of the model.

      (3) The authors suggests that the model's ability to generalize to new participants "likely relies on the fact that individuality indices form clusters and individuals similar to new participants exist in the training participant pool". It would be helpful to directly test this hypothesis by quantifying the similarity (or distance) of each test participant's individuality index to the individuals or identified clusters within the training set, and assessing whether greater similarity (or closer proximity) to the clusters in the training set is associated with higher prediction accuracy for those individuals in the test set.

    3. Reviewer #2 (Public review):

      This paper introduces a framework for modeling individual differences in decision-making by learning a low-dimensional representation (the "individuality index") from one task and using it to predict behaviour in a different task. The approach is evaluated on two types of tasks: a sequential value-based decision-making task and a perceptual decision task (MNIST). The model shows improved prediction accuracy when incorporating this learned representation compared to baseline models.

      The motivation is solid, and the modelling approach is interesting, especially the use of individual embeddings to enable cross-task generalization. That said, several aspects of the evaluation and analysis could be strengthened.

      (1) The MNIST SX baseline appears weak. RTNet isn't directly comparable in structure or training. A stronger baseline would involve training the GRU directly on the task without using the individuality index-e.g., by fixing the decoder head. This would provide a clearer picture of what the index contributes.

      (2) Although the focus is on prediction, the framework could offer more insight into how behaviour in one task generalizes to another. For example, simulating predicted behaviours while varying the individuality index might help reveal what behavioural traits it encodes.

      (3) It's not clear whether the model can reproduce human behaviour when acting on-policy. Simulating behaviour using the trained task solver and comparing it with actual participant data would help assess how well the model captures individual decision tendencies.

      (4) Figures 3 and S1 aim to show that individuality indices from the same participant are closer together than those from different participants. However, this isn't fully convincing from the visualizations alone. Including a quantitative presentation would help support the claim.

      (5) The transfer scenarios are often between very similar task conditions (e.g., different versions of MNIST or two-step vs three-step MDP). This limits the strength of the generalization claims. In particular, the effects in the MNIST experiment appear relatively modest, and the transfer is between experimental conditions within the same perceptual task. To better support the idea of generalizing behavioural traits across tasks, it would be valuable to include transfers across more structurally distinct tasks.

      (6) For both experiments, it would help to show basic summaries of participants' behavioural performance. For example, in the MDP task, first-stage choice proportions based on transition types are commonly reported. These kinds of benchmarks provide useful context.

      (7) For the MDP task, consider reporting the number or proportion of correct choices in addition to negative log-likelihood. This would make the results more interpretable.

      (8) In Figure 5, what is the difference between the "% correct" and "% match to behaviour"? If so, it would help to clarify the distinction in the text or figure captions.

      (9) For the cognitive model, it would be useful to report the fitted parameters (e.g., learning rate, inverse temperature) per individual. This can offer insight into what kinds of behavioural variability the individuality index might be capturing.

      (10) A few of the terms and labels in the paper could be made more intuitive. For example, the name "individuality index" might give the impression of a scalar value rather than a latent vector, and the labels "SX" and "SY" are somewhat arbitrary. You might consider whether clearer or more descriptive alternatives would help readers follow the paper more easily.

      (11) Please consider including training and validation curves for your models. These would help readers assess convergence, overfitting, and general training stability, especially given the complexity of the encoder-decoder architecture.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents a novel neural network-based framework for parameterizing individual differences in human behavior. Using two distinct decision-making experiments, the authors demonstrate the approach's potential and claims it can predict individual behavior (1) within the same task, (2) across different tasks, and (3) across individuals. While the goal of capturing individual variability is compelling and the potential applications are promising, the claims are weakly supported, and I find that the underlying problem is conceptually ill-defined.

      Strengths:

      The idea of using neural networks for parameterizing individual differences in human behavior is novel, and the potential applications can be impactful.

      Weaknesses:

      (1) To demonstrate the effectiveness of the approach, the authors compare a Q-learning cognitive model (for the MDP task) and RTNet (for the MNIST task) against the proposed framework. However, as I understand it, neither the cognitive model nor RTNet is designed to fit or account for individual variability. If that is the case, it is unclear why these models serve as appropriate baselines. Isn't it expected that a model explicitly fitted to individual data would outperform models that do not? If so, does the observed superiority of the proposed framework simply reflect the unsurprising benefit of fitting individual variability? I think the authors should either clarify why these models constitute fair control or validate the proposed approach against stronger and more appropriate baselines.

      (2) It's not very clear in the results section what it means by having a shorter within-individual distance than between-individual distances. Related to the comment above, is there any control analysis performed for this? Also, this analysis appears to have nothing to do with predicting individual behavior. Is this evidence toward successfully parameterizing individual differences? Could this be task-dependent, especially since the transfer is evaluated on exceedingly similar tasks in both experiments? I think a bit more discussion of the motivation and implications of these results will help the reader in making sense of this analysis.

      (3) The authors have to better define what exactly he meant by transferring across different "tasks" and testing the framework in "more distinctive tasks". All presented evidence, taken at face value, demonstrated transferring across different "conditions" of the same task within the same experiment. It is unclear to me how generalizable the framework will be when applied to different tasks.

      (4) Conceptually, it is also unclear to me how plausible it is that the framework could generalize across tasks spanning multiple cognitive domains (if that's what is meant by more distinctive). For instance, how can an individual's task performance on a Posner task predict task performance on the Cambridge face memory test? Which part of the framework could have enabled such a cross-domain prediction of task performance? I think these have to be at least discussed to some extent, since without it the future direction is meaningless.

      (5) How is the negative log-likelihood, which seems to be the main metric for comparison, computed? Is this based on trial-by-trial response prediction or probability of responses, as what usually performed in cognitive modelling?

      (6) None of the presented evidence is cross-validated. The authors should consider performing K-fold cross-validation on the train, test, and evaluation split of subjects to ensure robustness of the findings.

      (7) The authors excluded 25 subjects (20% of the data) for different reasons. This is a substantial proportion, especially by the standards of what is typically observed in behavioral experiments. The authors should provide a clear justification for these exclusion criteria and, if possible, cite relevant studies that support the use of such stringent thresholds.

      (8) The authors should do a better job of creating the figures and writing the figure captions. It is unclear which specific claim the authors are addressing with the figure. For example, what is the key message of Figure 2C regarding transfer within and across participants? Why are the stats presentation different between the Cognitive model and the EIDT framework plots? In Figure 3, it's unclear what these dots and clusters represent and how they support the authors' claim that the same individual forms clusters. And isn't this experiment have 98 subjects after exclusion, this plot has way less than 98 dots as far as I can tell. Furthermore, I find Figure 5 particularly confusing, as the underlying claim it is meant to illustrate is unclear. Clearer figures and more informative captions are needed to guide the reader effectively.

      (9) I also find the writing somewhat difficult to follow. The subheadings are confusing, and it's often unclear which specific claim the authors are addressing. The presentation of results feels disorganized, making it hard to trace the evidence supporting each claim. Also, the excessive use of acronyms (e.g., SX, SY, CG, EA, ES, DA, DS) makes the text harder to parse. I recommend restructuring the results section to be clearer and significantly reducing the use of unnecessary acronyms.

    1. eLife Assessment

      This manuscript makes important contributions to the methodology commonly used to assess representational structures in human and animal brain activity recorded using various techniques (especially fMRI). The evidence in the form of mathematical analysis and simulations is solid. The impact of this contribution could be improved by extending the simulations to assess the effects of violations of explicit and implicit assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a formalism for the relationship between neural signals and pooled signals (e.g., voxel estimates in fMRI) and explores why correlation-based and mean-removed Euclidean RDMs perform well in practice. The key assumption is that the pooled estimates are weighted averages, with i.i.d. non-negative weights. Two sets of simulations are used to support the theoretical findings: one based on fully simulated neural data and another that reverse-engineers neural data from an RDM estimated from real macaque data. The authors also discuss limitations of their simulations, particularly concerning the i.i.d. assumption of the weights.

      Strengths:

      The strengths of this work include its mathematical rigor and the clear connection that is drawn between the derivations and empirical observations. The simulations were well-designed and easy to follow. One small suggestion: a brief explanation of what is meant by "sparse" in Figure 3 would help orient the reader without requiring them to jump ahead to the methods. Overall, I found the work engaging and insightful.

      Weaknesses:

      Although I appreciate the effort to explore *why* certain dissimilarity measures perform well, it wasn't clear how these findings would inform the practical choices of researchers conducting RDM-based analyses. Many researchers likely already use correlation-based or mean-removed Euclidean distance measures, given their popularity. In that case, how do these results provide additional value or guidance beyond current practice?

      Another aspect that could benefit from further clarification is the core assumption underlying the work - that channel-based activity reflects a non-negative weighted average of neural activity. Is this widely accepted as the most plausible model, or are there alternative relationships that researchers should consider? While this may seem intuitive, it's not something I would expect all readers to be familiar with, and only a single reference was provided to support it (which I unfortunately didn't have time to read). That said, I did appreciate the discussion of the i.i.d. assumption in the discussion section. Can more be said to educate researchers as to when the i.i.d. assumption might be violated?

      I didn't find the "Simulations based on neural data" section added much, and it risks being misinterpreted. The main difference here is that neural data were reverse-engineered from a macaque RDM and then used in simulations similar to those in the previous section. What is the added value of using a real RDM to generate simulated data? Were the earlier simulations lacking in some way? There's also a risk of readers mistakenly inferring that human dissimilarities have been reconstructed from macaque data, an assumption that goes beyond the paper's core message, which focuses on linking neural and channel-based signals from the *same* source. If this section is retained, the motivation should be clarified, and the implied parallel in Figure 6, between the human data and simulated data, should be reconsidered.

    3. Reviewer #2 (Public review):

      Summary:

      The paper is a methodological contribution to multivariate pattern analysis and, in particular, the analysis of representational geometry via pairwise representational distances, sometimes called representational dissimilarity analysis (RDA). The authors investigate through theoretical analysis and simulations how true representational distances (defined on the neural level) give rise to representational distances estimated from neurophysiological data, including fMRI and cell recordings. They demonstrate that, due to the way measurements sample neural activity, the activity common to all sampled neurons can be amplified in the representational geometry derived from these measurements, and therefore, an empirical representational geometry may deviate substantially from the true representational geometry. The authors propose to modify the obtained representational structure by removing the dimension corresponding to that common activity, and argue that such a removal of a single dimension does not relevantly affect the representational structure, again underpinned by mathematical analysis and simulation.

      Importance:

      The paper may at first sight be tackling a specific problem within a specific subfield of cognitive neuroscience methods. However, understanding the structure of representations is a fundamental goal of cognitive psychology and cognitive neuroscience, and the fact that methods of representational geometry are not yet routinely used by the wider community may at least partially be due to uncertainty regarding the reliability of these methods. This paper is an important step towards clarifying and improving reliability, and therefore towards more widespread adoption of representational geometry methods.

      Strengths:

      The paper makes its argument generally well, relying on previous work by the authors as well as others to support assumptions about neural sampling by neurophysiological measurements. Their main points are underpinned by both detailed mathematical analysis and simulations, and the latter also produces intuitively accessible illustrations of the authors' argument. The authors discuss in detail under which exact circumstances common neural activity distorts the representational geometry, and therefore, when exactly the removal of the common dimension is necessary to minimize that distortion.

      Weaknesses:

      (1) The argument around the Johnson-Lindenstrauss lemma on pages 5 & 6 is somewhat confused, and also not really convincing.

      First, the correct reference for the lemma seems to be not [20] = Johnson et al. (1986), but Johnson & Lindenstrauss (1984). Moreover, as far as I can tell, Johnson et al. (1986) do not discuss random projections, and while they play a role in Johnson & Lindenstrauss (1984), that is only as a proof device. The paper text suggests that the lemma itself is probabilistic, while actually it is a statement of existence.

      Second, the authors correctly state that the lemma implies that "the number of measurement channels required for a good approximation does not depend on the number of neurons and grows only logarithmically with the number of stimuli", but it is not clear what the relevance of this statement for this paper is, considering that distances between N points can be exactly preserved within an N − 1 dimensional subspace, irrespective of the number of dimensions of the original space, and since in cognitive neuroscience the number of measurement channels is usually (much) larger than the number of experimental stimuli.

      The actually centrally important statement is not the Johnson-Lindenstrauss lemma, but one about the metric-preserving properties of random projections with zero-mean weights. It is this statement that needs to be backed up by the correct references, which, as far as I can tell, are neither the cited Johnson et al. (1986) nor even Johnson & Lindenstrauss (1984) for the lemma.

      (2) The detailed mathematical analyses and simulations focus on the effect of non-zero-mean sampling weights, and that is justified by the result that such sampling leads to a distorted representational geometry. However, there is another assumption which seems to be used almost everywhere in both mathematical analyses and simulations, and which I suspect may have a relevant effect on the observed representational geometry: statistical independence between weights. In particular, in fMRI, the existence of a naturally limited spatial resolution (due to MRI technology or vasculature) makes it unlikely that the weights with which a given neuron affects different voxels are independent.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the conditions under which representational distances estimated from brain-activity measurements accurately mirror the true geometry of the underlying neural representations. Using a theoretical framework and simulations, the authors show that (i) random weighted sampling of individual neurons preserves representational distances; (ii) the non-negative pooling characteristic of fMRI stretches the geometry along the population-mean dimension; and (iii) subtracting the across-channel mean from each activity pattern removes this distortion, explaining the well-known success of correlation-based RSA. They further argue that a mean-centred, squared Euclidean (or Mahalanobis) distance retains this corrective benefit while avoiding some pitfalls of variance normalisation.

      Strengths:

      (1) Theoretical clarity and novelty:<br /> The paper offers an elegant and convincing proof of how linear measurement models affect representational geometry and pinpoints the specific condition (non-zero-mean sampling weights) under which voxel pooling introduces a systematic bias. This quantitative explanation of why mean removal is effective in RSA is new and valuable.

      (2) Simulations:<br /> Experiments on both synthetic high-dimensional fMRI data and macaque-IT-inspired embeddings corroborate the mathematics, providing practical insights into the theoretical reasoning outlined by the authors.

      (3) Actionable recommendations:<br /> The work summarises the results into clear guidelines: random single-unit sampling is "safe" (the estimated geometry is undistorted); fMRI voxel data with unstructured or single-scale codes should be mean-centred; and multi-scale cortical maps require explicit forward modelling. These guidelines are clear, and useful for future research.

      Weaknesses:

      (1) Simplistic assumptions:<br /> The assumption that measurement-channel weights are drawn independently and identically distributed (i.i.d.) from a univariate distribution is a significant idealisation for fMRI data. Voxels have spatially structured responses (and noise), meaning they do not sample neurons with i.i.d. weights. The extent to which the conclusions (especially the "exact recovery" with mean centring) hold when this assumption is violated needs more discussion. While the paper states that the non-negative IWLCS model is a best-case scenario, the implications of deviations from this best case could be elaborated.

      (2) Random-subpopulation model for electrophysiology:<br /> Similarly, the "random subpopulation model" is presented as an idealisation of single-cell recordings. In reality, electrophysiological sampling is often biased (e.g., towards larger, more active neurons or neurons in accessible locations). The paper acknowledges biased sampling as a challenge that requires separate modelling, but the gap between this idealised model and actual practice should be highlighted more strongly when interpreting the optimistic results.

      (3) Noise as an "orthogonal issue":<br /> The theoretical derivations largely ignore measurement noise, treating it as an orthogonal problem solvable by cross-validation. Although bias from noise is a well-known problem, interactions between noise and sampling-induced distortions (especially the down-scaling of orthogonal dimensions) could complicate the picture. For instance, if a dimension is already heavily down-scaled by averaging, it might become more susceptible to being obscured by noise. Addressing or highlighting these points more explicitly would make the limitations of this theoretical framework more transparent.

      (4) Simulation parameters and generalizability:<br /> The random ground-truth geometries were generated from a Gaussian mixture in 5-D and then embedded into 1,024-D, with ≈25 % of the variance coming from the mean dimension. The sensitivity of the findings to these specific parameters (initial dimensionality, geometry complexity, proportion of mean variance, and sample size) could be discussed. How would the results change if the true neural geometry had a much higher or lower intrinsic dimensionality, or if the population-mean component were substantially smaller or larger? If the authors' claims are to generalise, more scenarios should be considered.

      (5) Mean addition to the neural-data simulation:<br /> In simulations based on neural data from Kiani et al., a random mean was added to each pattern to introduce variation along the mean dimension. This was necessary because the original patterns had identical mean activation. However, the procedure might oversimplify how population means vary naturally and could influence the conclusions, particularly regarding the impact of the population-mean dimension. While precisely modelling how the mean varies across conditions is beyond the manuscript's scope, this point should be stated and discussed more clearly.

      (6) Effect of mean removal on representational geometry:<br /> As noted, the benefits of mean removal hold "under ideal conditions". Real data often violates these assumptions. A critical reader might ask: What if conditions differ in overall activation and in more complex ways (e.g., differing correlation structures across neurons)? Is it always desirable to remove population-mean differences? For example, if a stimulus truly causes a global increase in firing across the entire population (perhaps reflecting arousal or salience), subtracting the mean would treat this genuine effect as a nuisance and eliminate it from the geometry. Prior literature has cautioned that one should interpret RSA results after demeaning carefully. For instance, Ramírez (2017) dubbed this problem "representational confusion", showing that subtracting the mean pattern can change the relationships between conditions in non-intuitive ways. These potential issues and previous results should be discussed and properly referenced by the authors.

      Appraisal, Impact, and Utility:

      The authors set out to identify principled conditions under which measured representational distances faithfully reflect the underlying neural geometry and to provide practical guidance for RSA across modalities. Overall, I believe they achieved their goals. Theoretical derivations identify the bias-inducing factors in linear measurement models, and the simulations verify the analytic claims, demonstrating that mean-pattern subtraction can indeed correct some mean-related geometric distortions. These conclusions strongly rely on idealised assumptions (e.g., i.i.d. sampling weights and negligible noise), but the manuscript is explicit about them, and the reasoning from evidence to claim is sound. A deeper exploration of how robust each conclusion is to violations of these assumptions, particularly correlated voxel weights and realistic noise, would make the argument even stronger.

      Beyond their immediate aims, the authors offer contributions likely to shape future work. Its influence is likely to influence both analysis decisions and the design of future studies exploring the geometry of brain representations. By clarifying why correlation-based RSA seems to work so robustly, they help demystify a practice that has so far been adopted heuristically. Their proposal to adopt mean-centred Euclidean or Mahalanobis distances promises a straightforward alternative that better aligns representational geometry with decoding-based interpretations.

      In sum, I see this manuscript as a significant and insightful contribution to the field. The theoretical work clarifying the impact of sampling schemes and the role of mean removal is highly valuable. However, the identified concerns, primarily regarding the idealized nature of the models (especially for fMRI), the treatment of noise, and the need for more nuanced claims, suggest that some revisions are necessary. Addressing these points would substantially strengthen the paper's conclusions and enhance its impact on the neuroscience community by ensuring the proposed methods are robustly understood and appropriately applied in real-world research settings.

    1. eLife Assessment

      This study makes an important contribution by showing that humans adapt learning rates rationally to environmental volatility yet systematically misattribute noise as volatility, demonstrating approximate rationality with simplified internal models. The evidence is compelling, encompassing a cleverly designed volatility-versus-noise paradigm, innovative lesion-based comparisons between reinforcement-learning and degraded Bayesian Observer Models, and convergent behavioural and pupillometric data. Expanding formal model comparisons (e.g., BIC/AIC) and directly contrasting RL and Bayesian fits to physiological markers would further enhance the work, but these are minor limitations that do not detract from the core findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understand the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      Weaknesses:

      The model space could be more extensive, although the authors have covered the most relevant models for the question at hand.

      Comments on revisions: I have no further recommendations for the authors, they have addressed my previous comments very well.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      Reinforcement Learning (RL) Model:<br /> They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent-it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      Bayesian Observer Model (BOM):<br /> To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM-in which the agent has a coarser representation of noise compared to volatility-provides the best fit to the participants' behavior. This suggests that participants are not fully distinguishing between noise and volatility, leading to misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Comments on revisions:

      The authors have addressed all my questions. Congratulations on the impressive work accomplished by the authors!

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      For clarity, the methods would benefit from further detail of task framing to participants. I.e. were there explicit instructions regarding volatility/task contingencies? Or were participants told nothing?

      We have added in the following explanatory text to the methods section (page 20), clarifying the limited instructions provided to participants:

      “Participants were informed that the task would be split into 6 blocks, that they had to learn which was the best option to choose, and that this option may change over time. They were not informed about the different forms of uncertainty we were investigating or of the underlying structure of the task (that uncertainty varied between blocks).”

      In the results, it would be useful to report the general task behavior of participants to get a sense of how they performed across different parts of the task. Also, were participants excluded if they didn't show evidence of learning adaptation to volatility?

      We have added the following text reporting overall performance to the results (page 6):

      “Participants were able to learn the best option to choose in the task, selecting the most highly rewarded option on an average of 71% of trials (range 65% - 74%).”

      And the following text to the methods, confirming that participants were not excluded if they didn’t respond to volatility/noise (the failure in this adaptation is the focus of the current study) (page 19):

      “No exclusion criteria related to task performance were used.”

      The results would benefit from a more intuitive explanation of what the lesioning is trying to recapitulate; this can get quite technical and the objective is not necessarily clear, especially for the less computationally-minded reader.

      We have amended the relevant section of the results to clarify this point (page 9):

      “Having shown that an optimal learner adjusts its learning rate to changes in volatility and noise as expected, we next sought to understand the relative noise insensitivity of participants. In these analyses we “lesion” the BOM, to reduce its performance in some way, and then assess whether doing so recapitulates the pattern of learning rate adaptation observed for participants (Fig 3e). In other words, we damage the model so it performs less well and then assess whether this damage makes the behaviour of the BOM (shown in Fig 3f) more closely resemble that seen in participants (Fig 3e).”

      The modelling might be improved by the inclusion of another class of model. Specifically, models that adapt learning rates in response to the estimation of latent states underlying the current task outcomes would be very interesting to see. In a sense, these are also estimating volatility through changeability of latent states, and it would be interesting to explore whether the findings could also be explained by an incorrect assumption that the latent state has changed when outcomes are noisy.

      Thank you for this suggestion. We have added additional sections to the supplementary materials in which we use a general latent state model and a simple RL model to try to recapitulate the behaviour of participants (and to compare with the BOM). These additional sections are extensive, so are not reproduced here. We have also added in a section to the discussion in the main paper covering this interesting question in which we confirm that we were unable to reproduce participant behaviour (or the normative effect of the lesioned BOMs) using these models but suggest that alternative latent state formulations would be interesting to explore in future work (page 18):

      “A related question is whether other, non-Bayesian model formulations may be able to account for participants’ learning adaptation in response to volatility and noise. Of note, the reinforcement learning model used to measure learning rates in separate blocks does not achieve this goal—as this model is fitted separately to each block rather than adapting between blocks (NB the simple reinforcement learning model that is fitted across all blocks does not capture participant behaviour, see supplementary information). One candidate class of model that has potential here is latent-state models (Cochran & Cisler, 2019), in which the variance and unexpected changes in the process being learned (which have a degree of similarity with noise and volatility respectively) is estimated and used to alter the model’s rates of updating as well as the estimated number of states being considered. Using the model described by Cochran and Cisler, we were unable to replicate the learning rate adaptation demonstrated by participants in the current study (see supplementary information) although it remains possible that other latent state formulations may be more successful. “

      The discussion may benefit from a little more discussion of where this work leads us - what is the next step?

      As above, we have added in a suggestion about future modelling work. We have also added in a section about the outstanding interesting questions concerning the neural representation of these quantities, reproduced in response to the suggestion by reviewer #2 below.

      Reviewer #2 (Recommendations for the authors):

      The study presents an opportunity to explore potential neural coding models that could account for the cognitive processes underlying the task. In the field of neural coding, noise correlation is often measured to understand how a population of neurons responds to the same stimulus, which could be related to the noise signal in this task. Since the brain likely treats the stimulus as the same, with noise representing minor changes, this aspect could be linked to the participants' difficulty distinguishing noise from volatility. On the other hand, signal correlation is used to understand how neurons respond to different stimuli, which can be mapped to the volatility signal in the task. It would be highly beneficial if the authors could discuss how these established concepts from neural population coding might relate to the Bayesian behavior model used in the study. For instance, how might neurons encode the distinction between noise and volatility at a population level? Could noise correlation lead to the misattribution of noise as volatility at a neural level, mirroring the behavioral findings? Discussing possible neural models that could explain the observed behavior and relating it to the existing literature on neural population coding would significantly enrich the discussion. It would also open up avenues for future research, linking these behavioral findings to potential neural mechanisms.

      We thank the reviewer for this interesting suggestion. We have added in the following paragraph to the discussion section which we hope does justice to this interesting questions (page 18):

      Previous work examining the neural representations of uncertainty have tended to report correlations between brain activity and some task-based estimate of one form of uncertainty at a time (Behrens et al., 2007; Walker et al., 2020, 2023). We are not aware of work that has, for example, systematically varied volatility and noise and reported distinct correlations for each. An interesting possibility as to how different forms of uncertainty may be encoded is suggested by parallels with the neuronal decoding literature. One question addressed by this literature is how the brain decodes changes in the world from the distributed, noisy neural responses to those changes, with a particular focus on the influence of different forms of between-neuron correlation (Averbeck et al., 2006; Kohn et al., 2016). Specifically, signal-correlation, the degree to which different neurons represent similar external quantities (required to track volatility) is distinguished from, and often limited by, noise-correlation, the degree to which the activity of different neurons covaries independently of these external quantities. One possibility relevant to the current study, which resembles the underlying logic of the BOM, is that a population of neurons represents the estimated mean of the generative process that produces task outcomes. In this case, volatility would be tracked as the signal-correlation across this population, whereas noise would be analogous to the noise-correlation and, crucially, misestimation of noise as volatility might arise as misestimation of these two forms of correlation. While the current study clearly cannot adjudicate on the neural representation of these processes, our finding of distinct behavioural and physiological responses to the two forms of uncertainty, does suggest that separable neural representations of uncertainty are maintained. “

    1. eLife Assessment

      The authors provide compelling evidence that a chloride ion stabilizes the protonated Schiff base chromophore linkage in the animal rhodopsin Antho2a. This important finding is novel and of major interest to a broad audience, including optogenetics researchers, protein engineers, spectroscopists, and environmental biologists. The study combines state-of-the-art research methods, such as spectroscopic and mutational analyses, which are complemented by QM/MM calculations, and was further improved based on the comments from the reviewers.

    2. Reviewer #1 (Public review):

      The chromophore molecule of animal and microbial rhodopsins is retinal which forms a Schiff base linkage with a lysine in the 7-th transmembrane helix. In most cases, the chromophore is positively charged by protonation of the Schiff base, which is stabilized by a negatively charged counterion. In animal opsins, three sites have been experimentally identified, Glu94 in helix 2, Glu113 in helix 3, and Glu181 in extracellular loop 2, where a glutamate acts as the counterion by deprotonation. In this paper, Sakai et al. investigated molecular properties of anthozoan-specific opsin II (ASO-II opsins), as they lack these glutamates. They found an alternative candidate, Glu292 in helix 7, from the sequences. Interestingly, the experimental data suggested that Glu292 is not the direct counterion in ASO-II opsins. Instead, they found that ASO-II opsins employ a chloride ion as the counterion. In case of microbial rhodopsin, a chloride ion serves as the counterion of light-driven chloride pumps. This paper reports the first observation of a chloride ion as the counterion in animal rhodopsin. Theoretical calculation using a QM/MM method supports their experimental data. The authors also revealed the role of Glu292, which serves as the counterion in the photoproduct and is involved in G protein activation.

      The conclusions of this paper are well supported by data.

    3. Reviewer #2 (Public review):

      Summary:

      This work reports the discovery of a new rhodopsin from reef-building corals that is characterized experimentally, spectroscopically, and by simulation. This rhodopsin lacks a carboxylate-based counterion, which is typical for this family of proteins. Instead, the authors find that a chloride ion stabilizes the protonated Schiff base and thus serves as a counterion.

      Strengths:

      This work focuses on the rhodopsin Antho2a, which absorbs in the visible spectrum with a maximum at 503 nm. Spectroscopic studies under different pH conditions, including the mutant E292A and different chloride concentrations, indicate that chloride acts as a counterion in the dark. In the photoproduct, however, the counterion is identified as E292.

      These results lead to a computational model of Antho2a in which the chloride is modeled in addition to the Schiff base. This model is improved using the hybrid QM/MM simulations. As a validation, the absorption maximum is calculated using the QM/MM approach for the protonated and deprotonated E292 residue as well as the E292A mutant. The results are in good agreement with the experiment. However, there is a larger deviation for ADC(2) than for sTD-DFT. Nevertheless, the trend is robust since the wt and E292A mutant models have similar excitation energies. The calculations are performed at a high level of theory that includes a large QM region.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Saito et al. studies the properties of anthozoan-specific opsins (ASO-II) from organisms found in reef-building coral. Their goal was to test if ASO-II opsins can absorb visible light, and if so, what are they key factors involved.

      The most exciting aspect of this work is their discovery that ASO-II opsins do not have a counterion residue (Asp or Glu) located at any of the previously known sites found in other animal opsins.

      This is very surprising. Opsins are only able to absorb visible (long wavelength light) if the retinal Schiff base is protonated, and the latter requires (as the name implies) a "counter ion". However, the authors clearly show that some ASO-II opsins do absorb visible light.

      To address this conundrum, they tested if the counterion could be provided by exogenous chloride ions (Cl-). Their results find compelling evidence supporting this idea, and their studies of ASO-II mutant E292A suggests E292 also plays a role in G protein activation and is a counterion for a protonated Schiff base in the light-activated form.

      Strengths:

      Overall, the methods are well described and carefully executed, and the results very compelling.

      Their analysis of seven ASO-II opsin sequences undoubtedly shows they all lack a Glu or Asp residue at "normal" (previously established) counter-ion sites in mammalian opsins (typically found at positions 94, 113 or 181). The experimental studies clearly demonstrate the necessity of Cl- for visible light absorbance, as do their studies of the effect of altering the pH.

      Importantly, the authors also carried out careful QM/MM computational analysis (and corresponding calculation of the expected absorbance effects), thus providing compelling support for the Cl- acting directly as a counterion to the protonated retinal Schiff base, and thus limiting the possibility that the Cl- is simply altering the absorbance of ASO-II opsins through some indirect effect on the protein.

      Altogether, the authors clearly achieved their aims, and the results support their conclusions. The manuscript is carefully written, and refreshingly, the results and conclusions not overstated.

      This study is impactful for several reasons. There is increasing interest in optogenetic tools, especially those that leverage G protein coupled receptor systems. Thus, the authors demonstration that ASO-II opsins could be useful for such studies is of interest.

      Moreover, the finding that visible light absorbance by an opsin does not absolutely require a negatively charged amino acid be placed at one of the expected sites (94, 113 or 181) typically found in animal opsins is very intriguing and will help future protein engineering efforts. The argument that the Cl- counterion system they discover here might have been a preliminary step in the evolution of amino acid based counterions used in animal opsins is also interesting.

      Finally, given the ongoing degradation of coral reefs worldwide, the focus on these curious opsins is very timely, as is the authors proposal that the lower Schiff base pKa they discovered here for ASO-II opsins may cause them to change their spectral sensitivity and G protein activation due to changes in their environmental pH.

    1. eLife Assessment

      This valuable study employs transition-metal FRET (tmFRET) and time-correlated single-photon counting to investigate allosteric conformational changes in both isolated cyclic nucleotide-binding domains (CNBDs) and full-length bacterial CNG channels, demonstrating that transmembrane domains stabilize CNBDs in their active state. By comparing isolated CNBD constructs with full-length channels, the authors reveal how allosteric networks couple domain movements to gating energetics, providing insights into ion channel regulation mechanisms. The rigorous methodology and compelling quantitative analysis establish a framework for applying tmFRET to study conformational dynamics in diverse protein systems.

    2. Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. In addition to the methodology being of general use for other proteins, the authors show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

    3. Reviewer #2 (Public review):

      The manuscript by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

    4. Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel. The experiments were performed on an isolated C-terminal nucleotide binding domain (CNBD) and on a purified full-length channel, with FRET partners placed at two positions in the CNBD.

      The data and quantitative analysis are exemplary, and they provide a roadmap for the use of this powerful approach in other proteins. In particular, the use of the fluorescence-lifetime decay histograms to learn not just the mean distance reported by the FRET, but also the distribution of states with different distances, allows better refinement of hypotheses for the gating motions.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full-length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non-canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. The methodology is of general use for other proteins. The authors also show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state, relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

      Reviewer #2 (Public review):

      The manuscript "Domain Coupling in Allosteric Regulation of SthK Measured Using Time-Resolved Transition Metal Ion FRET" by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

      The results summarize known conformational transitions of the C-helix and provide distance distributions that agree with predicted values based on available structures. The authors first validated their TCSPC approach using the isolated CNBD construct previously employed for similar experiments. They then study the more complex fulllength SthK channel protein. The findings agree with earlier results from this group, demonstrating that the C-helix is more mobile in the closed state than static structures reflect. Upon adding the activating ligand cAMP, the C-helix moves closer to the bound ligand, as indicated by a reduced fluorescence lifetime, suggesting a shorter distance between the donor and acceptor. The observed effects depend on the cAMP concentration, with affinities comparable to functional measurements. Interestingly, a substantial amount of CNBDs appear to be in the activated state even in the absence of cAMP (Figure 6E and F, fA2 ~ 0.4).

      This may be attributed to cooperativity among the CNBDs, which the authors could elaborate on further. In this context, the major limitation of this study is that distance distributions are observed only in one domain. While inter-subunit FRET is detected and accounted for, the results focus exclusively on movements within one domain. Thus, the resulting energetic considerations must be assessed with caution. In the absence of the activator, the closed state is favored, while the presence of cAMP favors the open state. This quantifies the standard assumption; otherwise, an activator would not effectively activate the channel. However, the numerical values of approximately 3 kcal/mol are limited by the fact that only one domain is observed in the experiment, and only one distance (C- helix relative to the CNBD) is probed. Additional conformational changes leading to pore opening (including rotation and upward movement of the CNBD, and radial dilation of the tetrameric assembly) are not captured by the current experiments. These limitations should be taken into account when interpreting the results.

      We agree that these are important limitations to consider in interpreting our results. These limitations and future directions are now largely covered in our discussion. We believe measurements in individual domains provide unique insights into the contributions of different parts of the protein and future work will continue to address conformational energetics in other parts of the protein and subunit cooperativity. 

      Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel.

      The experiments were performed on an isolated C-terminal nucleotide binding domain

      (CNBD) and on a purified full-length channel, with FRET partners placed at two

      positions in the CNBD.

      Strengths:

      The data and quantitative analysis are exemplary, and they provide a roadmap for use of this powerful approach in other proteins.

      Weaknesses/Comments:

      A ~3x lower Kd for nucleotide is seen for the detergent-solubilized full-length channel, compared to electrophysiological experiments. This is worth a comment in the Discussion, particularly in the context of the effect of the pore domain on the CNBD energetics.

      We are cautious to interpret our K<sub>D</sub> values given the high affinity for cAMP and the challenges of accurately determining the total protein concentrations in our experiments. We now state this explicitly in the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript is very well written and clear. Congrats to the authors.

      Minor comment: In "Measuring tmFRET in Full-Length SthK", 3rd paragraph: "... FRET model with both intersubunit and intersubunit FRET." Should read "intersubunit and intrasubunit".

      Thank you for the comment, this is now corrected.  

      Reviewer #2 (Recommendations for the authors):

      Overall, the manuscript is well-written and clearly explained. However, I recommend that the authors discuss the limitations more critically.

      The revised manuscript now largely addresses these limitations. Additional comments are addressed in short below:  

      A) Only one distance is measured.

      We believe validating a single distance as an important first step in determining the use of this technique and beginning to quantify the allosteric mechanism in SthK. Future studies aim to make additional measurements.

      B) Measurements are confined to a single domain in the cooperative tetrameric assembly.

      Isolating conformational changes in individual domains, allows us to determine how different parts of the protein contribute to the activation upon ligand binding.  

      C) The change in distance upon activation mirrors what is observed in the closed state, which casts doubt on whether these conformational changes actually lead to channel opening or merely reflect the upward swinging of the C-helix that contributes to coordinating cAMP in the binding pocket.

      Future studies aim to detect conformational changes in the pore and other parts of the protein.

      D) Rigid body movements, rotations, and dilations are not captured by the measurements. 

      Our measurements combine energetic information with some, although more limited, structural information.   

      E) Cooperativity is not considered in the interpretation of the results.

      It is currently unclear where in SthK cooperativity arises upon ligand activation (ie. at the level of the CNBD, C-Linker or pore). Our results do not provide evidence of cooperativity in the CNBD upon ligand binding. 

      Additionally, the authors directly correlate their results with the functional states of SthK previously reported, but it remains open whether the modified protein for tmFRET behaves similarly to WT SthK. Functional experiments with the protein used for tmFRET, which demonstrate comparable open probabilities and cAMP potency, would considerably strengthen the manuscript.

      Further optimization is needed to express the full-length protein used in tmFRET experiments in spheroplasts to enable electrophysiological recordings from these constructs. 

      Reviewer #3 (Recommendations for the authors):

      In the final paragraph of the Discussion, the sentence "In our experiments, we assumed that deleting the pore and transmembrane domains eliminates the coupling of these regions to the CNBD" seems trivial. Perhaps it would help to add "simply" before eliminates?

      We have taken the advice and added ‘simply’ in this sentence.  

      Can a statement be made about the magnitude of the effect in the C-terminal deletion experiments in refs 27-29?

      Due to the different channels used in the C-terminal deletion experiments in refs 27-29 (HCN1 and spHCN), compared to the channel we used (SthK), it is challenging to compare the magnitude of energetic changes between these studies. Additionally, the HCN experiments measured changes in the pore domain, compared to the conformational changes in the CNBD domain measured here.

    1. eLife Assessment

      The authors provide a convincing summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. It is particularly valuable at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

    2. Reviewer #1 (Public review):

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded. The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals. In revision, the paper has been improved with respect to clarity and by bringing together two separate papers into one stronger piece.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      The authors have improved the presentation by integrating the weaker of the two manuscripts with the stronger, by clarifying terminology and by performing additional analyses.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.

      Strengths:

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals.

      Weaknesses:

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1.

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.

      Reviewer #2 (Public review):

      Summary:

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact.

      Strengths:

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented.

      Weaknesses:

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success.

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

      Recommendations for the authors:

      Editorial note:

      In the discussion, the reviewers agreed that the present manuscript does not make a sufficient independent contribution and so would be more profitably combined with the companion manuscript. Both reviewers noted that there was not much insight that relied on the single figure. Since neither manuscript is long, and they have overlapping authors (including the same first and last authors), this should not be a difficult merger to achieve.

      Thank you for the recommendation to merge. We have combined both manuscripts into one in this version.

      Reviewer #1 (Recommendations for the authors):

      The jargon of the grant programs could be described as a nightmare. Wellcome is spelled wrong.

      We have attempted to limit the use of jargon and to define acronyms in this version. We have corrected the spelling of Wellcome.

      Reviewer #2 (Recommendations for the authors):

      I suggest that the two manuscripts be combined into a single paper. Although the other manuscript could stand on its own, this one does not.

      The idea of culture change surrounding teams is useful but really forms more of a policy- focused opinion piece than a quantitative analysis of funding impact.

      If the authors insist on keeping these separate, it is critical to remove the team data from the other manuscript.

      We have combined both manuscripts and decided to retain the description of culture change but have edited and condensed this section and will use the supplemental report for qualitative assessments.

    1. Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      (6) Some figures are not clear (see Figure 4 A & B).

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

    1. eLife Assessment

      This manuscript provides valuable insights into the heterogeneity of hematopoietic stem cells and age-associated myeloid-biased hematopoiesis. While several aspects of the study are intriguing and merit further investigation, the current results remain incomplete and additional data are necessary to substantiate the conclusions. Some of the methods and data analyses partially support the claims.

    2. Reviewer #1 (Public review):

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. Authors used Hoxb5 reporter mice to isolated LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in disease context as well. However, this study needs more definitive data.

      (1) Authors' experimental designs have some caveats to definitely support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs (an average of 300,000 up to 500,000 cells per mouse; Mitchell et al., Nature Cell Biology, 2023) can faithfully represent old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Fig. 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      In response to the above comments, the authors calculated the required sample size as approximately 384 cells to represent 500,000 HSCs per old mouse. Based on the total 1260 cells used throughout the whole manuscript (Figures 2, 3, 5, 6, S3, and S6), the authors claimed that the data is reflecting old HSC behavior. However, 384 cells represent HSCs from one old mouse. Following the authors' logic, they did only 3.2 mice (1260/384) experiment for the whole manuscript to make their argument. N of 3 is not enough, especially for old mice experiments considering the heterogeneity of aged mice. Also, they did not address the comment regarding inflammatory aged niche effects.

      (2) Authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Fig. 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Fig. 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggest that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. Authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      (3) Although authors could not find any molecular evidence for myeloid-biased hematopoiesis from old HSCs (either LT or ST), they argued that the ratio between LT-HSC and ST-HSC causes myeloid-biased hematopoiesis upon aging based on young HSC experiments (Fig. 6). However, old ST-HSC functional data showed that they barely contribute to blood production unlike young Hoxb5- HSCs (ST-HSC) in the transplantation setting (Fig. 2). Is there any evidence that in unperturbed native old hematopoiesis, old Hoxb5- HSCs (ST-HSC) still contribute to blood production? To answer this question, authors performed additional experiments with increased cell number (Fig. S6). Although Fig. S6.D data has a statistical significance, it is questionable how biologically meaningful it is. More fundamental question is back to the representability. Can this cell number used in this experiment represent old HSC (either LT or ST) behavior?

    3. Reviewer #2 (Public review):

      Summary:

      Nishi et al, investigate the well-known and previously described phenomenon of age-associated myeloid-biased hematopoiesis. Using a previously established HoxB5mCherry mouse model, they used HoxB5+ and HoxB5- HSCs to discriminate cells with long-term (LT-HSCs) and short-term (ST-HSCs) reconstitution potential and compared these populations to immunophenotypically defined 'bulk HSCs' that consists of a mixture of LT-HSC and ST-HSCs. They then isolated these HSC populations from young and aged mice to test their function and myeloid bias in non-competitive and competitive transplants into young and aged recipients. Based on quantification of hematopoietic cell frequencies in the bone marrow, peripheral blood, and in some experiments the spleen and thymus, the authors argue against the currently held belief that myeloid-biased HSCs expand with age.

      While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Fig 3; Fig 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.<br /> It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation.

      Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as:<br /> a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std);<br /> b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HScs in competitive transplants (mind low n-numbers and large std!!!).<br /> However, given the low n-numbers and high variance of the results, the argument seems weak and the presented data does not support the claims sufficiently. That the number of downstream progenitors does not change could be explained by other mechanisms, for instance, the frequently reported differentiation short-cuts of HSCs and/or changes in the microenvironment.

      Strengths:

      The authors present an interesting observation and offer an alternative explanation of the origins of aged-associated myeloid-biased hematopoiesis. Their data regarding the role of the microenvironment in the spleen and thymus appears to be convincing.

      Weaknesses:

      "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Fig. 3, B and C)."<br /> [Comment to the authors]: Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity.

      Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones."

      [Comment to the authors]: Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

      New comment for the authors:

      While the authors provide new evidence, clarify the text, and adjust their interpretation, the presented data remain weak and do not convincingly challenge the current paradigm. As myeloid-biased HSC expansion with age has been observed and published by many different groups, the authors need to provide much stronger evidence to challenge the observations of others. Key experiments that might support their claims had been suggested, but as indicated, the authors plan to provide these much more rigorous experiments in future studies. As it stands, the overall conclusions of this manuscript thus remain weak and preliminary.

      In an attempt to quantify the absolute cell number of HSPC subpopulations, the authors use a usual readout and quantify "Number of cells per minute of analysis time". This appears to be a quick and dirty reanalysis of already existing flow cytometry data. Unfortunately, this quantification cannot count the absolute number of cells reliably, as the number of cells per minute recorded is heavily influenced by the abundance of other cell populations. Instead, the author should have counted the absolute number of HSCs, MPPs, GMPs, etc. per femur, which is typically done to address this question.

      At this point, as authors are seemingly not willing to provide additional hard evidence to support their claims in this study and are instead in the process of preparing additional data for a future manuscript, I believe this study, as it stands (although weak), suggests an interesting alternative model. Despite being highly controversial, this alternative model warrants future investigations and discussions in the field. As always, it will also be important to reproduce these findings independently in other labs. As my concerns and the concerns of the other reviewers are documented and available to read by others, I believe the manuscript should be published in its current form to stimulate critical discussion and future investigations of the current model.

    4. Reviewer #3 (Public review):

      In this manuscript, Nishi et al. propose a new model to explain the previously reported myeloid-biased hematopoiesis associated with aging. Traditionally, this phenotype has been explained by the expansion of myeloid-biased hematopoietic stem cell (HSC) clones during aging. Here, the authors question this idea and show how their Hoxb5 reporter model can discriminate long-term (LT) and short-term (ST) HSC and characterized their lineage output after transplant. From these analyses, the authors conclude that changes during aging in the LT/ST HSC proportion explain the myeloid bias observed.

      Comments on revisions:

      I appreciate the authors' reply to some of my comments. However, there are some key aspects that remain unresolved. Please see below.

      - The authors propose a critical change in the way we consider the mechanisms leading to lineage biased hematopoiesis during aging. As Reviewer 2 mentioned, such a strong claim needs to be supported by solid experimental data. Unfortunately, the level of variability in key in vivo experiments (Figure 2 and 3) diminishes the robustness of these results.

      The authors argue that even with the low number of mice used in some of these experiments and the high level of variability, differences still reach (or not) statistical significance according to their analysis. I am not an expert on statistics but the only test that is mentioned is their methodology is a Welch's t test, which is only appropriate for data following a normal distribution. A more rigorous statistical analysis should be performed to sustain the claims included in the current manuscript.

      - The chosen irradiation regiment might contribute to the uncertainty of the data and influence their interpretation. As the authors show in their response to my "comment to our #3-4 response", there is a considerable (and variable) amount of "radioresistant" CD45.1+CD45.2- cells in their primary recipients, which become concerningly high in the secondary transplant. This is not found in previous publications focused on this topic and, therefore, it makes it difficult to compare those studies with the present manuscript. The inclusion of this aspect in the text is appreciated but definitely reduces the impact of their claims.

      - The correction introduced in the main text as an answer to the original comment #3-6 is still misleading. There is an assumption for GMP, CMP and MEP to increase with age if myeloid-biased HSC clones increase with age ("in contrast to what we anticipated"). Again, the link between these two changes could be more complex than just a direct correlation.

    1. eLife Assessment

      In this valuable study, Taber et al used a battery of biophysical and structural approaches to characterize the impact of erythrocytosis-related mutations in prolyl hydroxylase domain protein 2 (PHD2). The authors show that PHD2 mutant proteins are destabilized, thus supporting the tenet that dysregulation of PHD2/hypoxia induced factor (HIF) axis underpins erythrocytosis, while providing incomplete evidence that N-terminal ODD prolyl hydroxylation of HIF is indispensable for these phenotypes. Notwithstanding that this study was found to be of broad interest for a variety of fields focusing on oxygen sensing in homeostasis and pathological states, resolving inconsistencies in the biophysical analysis (e.g., NMR, SEC, and BLI/MST) was thought to be warranted to further corroborate the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.

      Strengths:

      (1) Simple, easy-to-follow manuscript. Generally well-written.

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action.

      (3) Good, well-researched background section.

      Weaknesses:

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD.

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation.

    3. Reviewer #2 (Public review):

      Summary:

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patient-derived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors.

      Strengths:

      (1) This manuscript is well-written and clear.

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims.

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells.

      Weaknesses:

      Major:

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      (2) The NMR hydroxylation assay.

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B.<br /> B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec?<br /> C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this?

      (3) Data validating the CRISPR KO HEK293A cells is missing.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data.

      Minor:

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided.

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity?

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway.

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought.

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases.

      Strengths:

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis.

      Weaknesses:

      There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling.

      The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM).

      Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis.

      Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We will further analyze the mutations on the available PHD2 crystal structures in complex with HIFa to discern how these substitution mutations may impact PHD2 structure and function.  

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed.  We will perform additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph will be changed for a clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  We will expand our discussion accordingly. 

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had Kd of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. We will perform additional binding experiments to further interrogate and validate the binding affinity of PHD2 P317R to NODD and CODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K<sub>d</sub>’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  However, we will perform additional binding experiments to further interrogate PHD2 P317R binding to NODD.   

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We will expand on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data will be added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we will correct and clarify our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data will be added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells will be added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We will perform additional experiments as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants will be added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis. 

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2. The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations. The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter. Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants. The limitations of the luciferase assay will be expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif. While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD. We will discuss this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study. We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog. 

      However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

    1. eLife Assessment

      Based on several lines of interesting data, the authors conclude that FMRP, though associated with stalled ribosomes, does not determine the position on the mRNAs at which ribosomes stall. Although this conclusion would be valuable if clearly established, the current set of data are incomplete and it is unclear if the methodologies applied in this paper are fully adequate to address this gap.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

    3. Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

    4. Reviewer #3 (Public review):

      Summary: Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

      Thank you for your comments and for pointing out the strengths of the manuscript. Unfortunately, we will not be able to respond to point #1. The protocol for purification of the ribosomes from RNA granules does not work in older brains (See Khandjian et al, 2004 PNAS 101:13357), presumably due to the presence of large concentrations of myelin. While it would be possible to repeat our results later in culture, we have no expectation that it would be different since we do observe DHPG induction of elongation dependent, initiation independent mGLUR-LTD in later cultures (Graber et al, 2017 J. Neuroscience 37:9116)..We will strengthen this caveat in the discussion that our results are only at a snapshot of development and that it is certainly possible that different results may be seen at different times. We agree with point 2 that ‘distal granules’ is a vague term. We will remove the term and clarify that we only quantified granules larger than 50 microns from the cell soma. We do not know if these granules are distinct. We would respectfully disagree with point #3 that the study does not provide molecular insight into the function of FMRP, as disproving that FMRP is important for stalling and determining the position of stalling removes a major hypothesis about the function of FMRP, and showing that something is not true, is at least to me, providing insight.

      Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

      Thank you for your comments. We agree with the issue in point #1 that the equivalence of RPM puncta with the RG fraction is an issue and while we believe that we show in a number of ways that the two are related (anisomycin-resistant puromycylation, puromyclation only at high concentrations consistent with the hybrid state, etc), we would respectfully disagree that our main message results from the equivalence of the RPM-labeled RNA granules in neurites and the ribosomes isolated by sedimentation. We will make this point clearer in our revision. For point #2, we agree that the changes with increased nuclease is somewhat out of place in a narrative sense, but it is clearly relevant to this work. Whether or not one sees this as a ‘correction’ or an interesting point will depend on a better characterization of the structures of the stalled polysomes. My personal view is that the nuclease resistance of cleavage near the RNA entrance site is quite interesting. Since we reproduce our results with a similar nuclease treatment in mice, as reported in our previous publication, I believe the comparison could be of interest in the future and would like to retain it. We agree with point #3 and will temper these claims in our revised version. For point #4, we will determine more carefully why the number of peaks differs and switch the main and supplemental figures. We apologize for the typo in the figure legend in Figure 9, 171, not 5171. The box plot line shows the median not the average and the data is clearly skewed such that the median and average are different (i.e. there is a two-fold decrease in the average density of distal puncta between WT and FMRP, but the average density is actually slightly decreased with HHT and A, although the median increases slightly. We will now report the results in distinct modalities to clarify this, and we will reexamine the statistics to better address the skewed distribution of values in the revised version.

      Summary:

      Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      (1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      (2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      (3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      (4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      (1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      (2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

      Thank you for your comments on the strengths of the manuscript. We agree with point #1 that the mouse RNA granule characterization needs to be more rigorous and we plan to accomplish this in our revised version. Similarly, we will incorporate the additional statistical analysis suggested by the reviewer in a revised version.

    1. eLife Assessment

      In this study, the authors investigate the role of ZMAT3, a p53 target gene, in tumor suppression and RNA splicing regulation. Using quantitative proteomics, the authors uncover that ZMAT3 knockout leads to upregulation of HKDC1, a gene linked to mitochondrial respiration, and that ZMAT3 suppresses HKDC1 expression by inhibiting c-JUN-mediated transcription. This set of convincing evidence reveals a fundamental mechanism by which ZMAT3 contributes to p53-driven tumor suppression by regulating mitochondrial respiration.

    2. Reviewer #1 (Public review):

      Summary:

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53-mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.

      The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.

    3. Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3-interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.

    1. eLife Assessment

      In their study, Neiswender et al. provide important insights into how BicD2 variants linked to spinal muscular atrophy alter dynein activity and cargo specificity. While the findings suggest disease-relevant changes in BicD2's binding partners, the evidence connecting these changes to disease mechanisms remains incomplete and would benefit from further experimental validation. The work lays a strong foundation for future research, but could be strengthened by deeper functional analysis of key interactions, such as the BicD2/HOPS complex.

    2. Reviewer #1 (Public review):

      In this work, Neiswender and colleagues test the hypothesis that mutations in BicD2 that are associated with SMALED alter BicD2-cargo interactions. To do this, they first establish the WT BicD2 cargo interactome (using a proximity-dependent biotin ligase screen with Turbo-ID on the BicD2 C-terminus). In addition to known cargo interactors, they also identified many proteins in the HOPs complex. Interestingly, they find that the HOPs complex may interact with BicD2 in a different manner than other known cargos. The authors also show that while BicD2 is required for the HOPs complex localization, on average, depletion of BicD2 from HeLa and Cos7 cells causes HOPs and Lysosome mislocalization that is consistent with Kinesin-1 trafficking defects, rather than dynein. The authors also use proximity biotin ligase approaches to define the cargo interactome of three BicD2 variants associated with SMALED. One variant (R747C) has the most altered cargo interactome. The authors highlight one protein, in particular, GRAMD1A, that is only found in the R747C dataset and mislocalizes specifically when R747C is expressed.

      The work in this manuscript is of a very high quality and contributes important findings to the field. I have a few questions that, if answered, could increase the impact of this work.

      (1) I was surprised at the effect of BicD2 knockdown on LAMP (and VPS41) localization, which really suggests that in HeLa and Cos7 cells, BicD2 regulation of Kinesin-1 (rather than dynein) is the primary driver of lysosome localization. The KIF5B-knockout rescue of the BicD2-overexpression phenotype was a very powerful result that supports this conclusion. Have the authors looked at other cargos, eg, Golgi or centrosomes in G2? Can the authors include more discussion about what this result means or how they imagine dynein and kinesin-1's interaction with BicD2 is regulated?

      (2) Have the authors examined if the SMALED mutants show diminished or increased binding to KIF5B? While the authors are correct that the mutations could hyperactivate dynein because they reduce BicD2 autoinhibition, it is possible that the SMALED mutants hyperactivate dynein because they no longer bind kinesin. This would be particularly interesting, given the complex relationship between BicD2 regulation of dynein and kinesin that the authors show in Figure 3.

      (3) What is already known about the protein GRAMD1A? Did the authors choose to focus on GRAMD1A because it was the only novel interaction found in the SMALED mutant interactomes, or was this protein interesting for a different reason? Does the known function of GRAMD1A explain the potential dysfunction of cells expressing BICD2_R747C or patients who have this mutation? More discussion of this protein and why the authors focused on it would really strengthen the manuscript.

    3. Reviewer #2 (Public review):

      Neiswender et al. investigated the interactomes between wild-type BICD2 and BICD2 mutants that are associated with Spinal Muscular Atrophy with Lower Extremity Predominance (SMALED2). Although BICD2 has previously been implicated in SMALED2, it is unclear how mutations in BICD2 may contribute to disease symptoms. In this study, the authors characterize the interactome of wild-type BICD2 and identify potential new cargos, including the HOPS complex. The authors then chose three SMALED2-associated BICD2 mutants and compared each mutant interactome to that of wild-type BICD2. Each mutant had a change in the interactome, with the most drastic being BICD2_R747C, a mutation in the cargo binding domain of BICD2. This mutant displayed less interaction with a potential new BICD2 cargo, the HOPS complex. Additionally, it displayed more interaction with an ER protein, GRAMD1A.

      The data in the paper is generally strong, but the major conclusions of this paper need more evidence to be better supported.

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7.

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms.

    4. Reviewer #3 (Public review):

      Summary:

      BicD2 is a motor adapter protein that facilitates cellular transport pathways, which are impacted by human disease mutations of BicD2, causing spinal muscular atrophy with lower extremity dominance (SMALED2). The authors provide evidence that some of these mutations result in interactome changes, which may be the underlying cause of the disease. This is supported by proximity biotin ligation screens, immunoprecipitation, and cell biology assays. The authors identify several novel BicD2 interactions, such as the HOPS complex that participates in the fusion of late endosomes and autophagosomes with lysosomes, which could have important functions. Three BicD2 disease mutants studied had changes in the interactome, which could be an underlying cause for SMALED2. The study extends our understanding of the BicD2 interactome under physiological conditions, as well as of the changes in cellular transport pathways that result in SMALED2. It will be of great interest for the BicD2 and dynein fields.

      Strengths:

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain-of-function interaction with GRAMD1A.

      Weaknesses:

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant?

      Major points:

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript.

      (2) In the biotin proximity ligation assay, a large number of BicD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why, particularly GRAMD1A was chosen as a gain-of-function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant, but GRAMD1A also interacts with WT BicD2.

      (3) Furthermore, the functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors, a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyperactivation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data.

    1. eLife Assessment

      This valuable study identifies asymmetric dimethylarginine (ADMA) histones as potential determinants of the initial genomic binding of Rhino, a Drosophila-specific chromatin protein essential for piRNA cluster specification. The authors provide correlative genomic and imaging data to support their model, although functional validation of the proposed mechanism remains incomplete. The authors could revise the manuscript to reflect that they have uncovered a small subset of piRNA clusters dependent on ADMA-histones, which may not be the general rule.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand how Rhino, a chromatin protein essential for small RNA production in fruit flies, is initially recruited to specific regions of the genome. They propose that asymmetric arginine methylation of histones, particularly mediated by the enzyme DART4, plays a key role in defining the first genomic sites of Rhino localization. Using a combination of inducible expression systems, chromatin immunoprecipitation, and genetic knockdowns, the authors identify a new class of Rhino-bound loci, termed DART4 clusters, that may represent nascent or transitional piRNA clusters.

      Strengths:

      One of the main strengths of this work lies in its comprehensive use of genomic data to reveal a correlation between ADMA histones and Rhino enrichment at the border of known piRNA clusters. The use of both cultured cells and ovaries adds robustness to this observation. The knockdown of DART4 supports a role for H3R17me2a in shaping Rhino binding at a subset of genomic regions.

      Weaknesses:

      However, Rhino binding at, and piRNA production from, canonical piRNA clusters appears largely unaffected by DART4 depletion, and spreading of Rhino from ADMA-rich boundaries was not directly demonstrated. Therefore, while the correlation is clearly documented, further investigation would be needed to determine the functional requirement of these histone marks in piRNA cluster specification.

      The study identify piRNA cluster-like regions called DART4 clusters. While the model proposes that DART4 clusters represent evolutionary precursors of mature piRNA clusters, the functional output of these clusters remains limited. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwi-dependent silencing.

      In summary, the authors present a well-executed study that raises intriguing hypotheses about the early chromatin context of piRNA cluster formation. The work will be of interest to researchers studying genome regulation, small RNA pathways, and the chromatin mechanisms of transposon control. It provides useful resources and new candidate loci for follow-up studies, while also highlighting the need for further functional validation to fully support the proposed model.

    3. Reviewer #2 (Public review):

      This study seeks to understand how the Rhino factor knows how to localize to specific transposon loci and to specific piRNA clusters to direct the correct formation of specialized heterochromatin that promotes piRNA biogenesis in the fly germline. In particular, these dual-strand piRNA clusters with names like 42AB, 38C, 80F, and 102F generate the bulk of ovarian piRNAs in the nurse cells of the fly ovary, but the evolutionary significance of these dual-strand piRNA clusters remains mysterious since triple null mutants of these dual-strand piRNA clusters still allows fly ovaries to develop and remain fertile. Nevertheless, mutants of Rhino and its interactors Deadlock, Cutoff, Kipferl and Moonshiner, etc, causes more piRNA loss beyond these dual-strand clusters and exhibit the phenotype of major female infertility, so the impact of proper assembly of Rhino, the RDC, Kipferl etc onto proper piRNA chromatin is an important and interesting biological question that is not fully understood.

      This study tries to first test ectopic expression of Rhino via engineering a Dox-inducible Rhino transgene in the OSC line that only expresses the primary Piwi pathway that reflects the natural single pathway expression the follicle cells and is quite distinct from the nurse cell germline piRNA pathway that is promoted by Rhino, Moonshiner, etc. The authors present some compelling evidence that this ectopic Rhino expression in OSCs may reveal how Rhino can initiate de novo binding via ADMA histone marks, a feat that would be much more challenging to demonstrate in the germline where this epigenetic naïve state cannot be modeled since germ cell collapse would likely ensue. In the OSC, the authors have tested the knockdown of four of the 11 known Drosophila PRMTs (DARTs), and comparing to ectopic Rhino foci that they observe in HP1a knockdown (KD), they conclude DART1 and DART4 are the prime factors to study further in looking for disruption of ADMA histone marks. The authors also test KD of DART8 and CG17726 in OSCs, but in the fly, the authors only test Germ Line KD of DART4 only, they do not explain why these other DARTs are not tested in GLKD, the UAS-RNAi resources in Drosophila strain repositories should be very complete and have reagents for these knockdowns to be accessible.

      The authors only characterize some particular ADMA marks of H3R17me2a as showing strong decrease after DART4 GLKD, and then they see some small subset of piRNA clusters go down in piRNA production as shown in Figure 6B and Figure 6F and Supplementary Figure 7. This small subset of DART4-dependent piRNA clusters does lose Rhino and Kipferl recruitment, which is an interesting result.

      However, the biggest issue with this study is the mystery that the set of the most prominent dual-strand piRNA clusters. 42AB, 38C, 80F, and 102F, are the prime genomic loci subjected to Rhino regulation, and they do not show any change in piRNA production in the GLKD of DART4. The authors bury this surprising negative result in Supplementary Figure 5E, but this is also evident in no decrease (actually an n.s. increase) in Rhino association in Figure 5D. Since these main piRNA clusters involve the RDC, Kipferl, Moonshiner, etc, and it does not change in ADMA status and piRNA loss after DART4 GLKD, this poses a problem with the model in Figure 7C. In this study, there is only a GLKD of DART4 and no GLKD of the other DARTs in fly ovaries.

      One way the authors rationalize this peculiar exception is the argument that DART4 is only acting on evolutionarily "young" piRNA clusters like the bx, CG14629, and CG31612, but the lack of any change on the majority of other piRNA clusters in Figure 6F leaves upon the unsatisfying concern that there is much functional redundancy remaining with other DARTs not being tested by GLKD in the fly that would have a bigger impact on the other main dual-strand piRNA clusters being regulated by Rhino and ADMA-histone marks.

      Also, the current data does not provide convincing enough support for the model Figure 7C and the paper title of ADMA-histones being the key determinant in the fly ovary for Rhino recognition of the dual-strand piRNA clusters. Although much of this study's data is well constructed and presented, there remains a large gap that no other DARTs were tested in GLKD that would show a big loss of piRNAs from the main dual-strand piRNA clusters of 42AB, 38C, 80F, and 102F, where Rhino has prominent spreading in these regions.

      As the manuscript currently stands, I do not think the authors present enough data to conclude that "ADMA-histones [As a Major new histone mark class] does play a crucial role in the initial recognition of dual-strand piRNA cluster regions by Rhino" because the data here mainly just show a small subset of evolutionarily young piRNA clusters have a strong effect from GLKD of DART4. The authors could extensively revise the study to be much more specific in the title and conclusion that they have uncovered this very unique niche of a small subset of DART4-dependent piRNA clusters, but this niche finding may dampen the impact and significance of this study since other major dual-strand piRNA clusters do not change during DART4 GLKD, and the authors do not show data GLKD of any other DARTs. The niche finding of just a small subset of DART-4-dependent piRNA clusters might make another specialized genetics forum a more appropriate venue.

    1. eLife Assessment

      This is a useful study in the role of CHI3L1 in Kupffer cells, the macrophages of the liver, showing that CHI3L1 alters glucose regulation in obesity. Specifically, Chi3l1 protects glucose-dependent Kupffer cells during Metabolic dysfunction-associated steatotic liver disease (MASLD) by inhibiting glucose uptake, preventing metabolic stress and death. These data are compelling, yet require further validation.

    2. Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high-fat, high-fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq, they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective, they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH, there are some concerns about the current data that limit my enthusiasm for the study in its current form. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC ( Clec4F) and MdM KO (LysM-Cre) experiments is flawed. For example, in Figure 2 the authors present data that knockout of Chil1 in KCs using Clec4f Cre produces worse liver steatosis and insulin resistance. However, in supplemental Figure 4, they perform the same experiment in LysM-Cre mice and find a somewhat different phenotype. The authors appear to be under the impression that LysM-Cre does not cause recombination in KCs and therefore interpret this data to mean that Chil1 is relevant in KCs and not MdMs. However, LysM-Cre DOES lead to efficient recombination in KCs and therefore Chil1 expression will be decreased in both KCs and MdM (along with PMNs) in this line.

      Therefore, a phenotype observed with KC-KO should also be present in this model unless the authors argue that loss of Chil1 from the MdMs has the opposite phenotype of KCs and therefore attenuates the phenotype. The Cx3Cr1 CreER tamoxifen inducible system is currently the only macrophage Cre strategy that will avoid KC recombination. The authors need to rethink their results with the understanding that Chil1 is deleted from KCs in the LysM-Cre experiment. In addition, it appears that only one experiment was performed, with only 5 mice in each group for both the Clec4f and LysM-Cre data. This is generally not enough to make a firm conclusion for MASH diet experiments.

      (2) The mouse weight gain is missing from Figure 2 and Supplementary Figure 4. This data is critical to interpret the changes in liver pathology, especially since they have worse insulin resistance.

      (3) Figure 4 suggests that KC death is increased with KO of Chil1. However, this data cannot be concluded from the plots shown. In Supplementary Figure 6 the authors provide a more appropriate gating scheme to quantify resident KCs that includes TIM4. The TIM4 data needs to be shown and quantified in Figure 4. As shown in Supplementary Figure 6, the F4/80 hi population is predominantly KCs at baseline; however, this is not true with MASH diets. Most of the recruited MoMFs also reside in the F4/80 hi gate where they can be identified by their lower expression of TIM4. The MoMF gate shown in this figure is incorrect. The CD11b hi population is predominantly PMNs, monocytes, and cDC,2 not MoMFs (PMID:33997821). In addition, the authors should stain the tissue for TIM4, which would also be expected to reveal a decrease in the number of resident KCs.

      (4) While the Clec4F Cre is specific to KCs, there is also less data about the impact of the Cre system on KC biology. Therefore, when looking at cell death, the authors need to include some mice that express Clec4F cre without the floxed allele to rule out any effects of the Cre itself. In addition, if the cell death phenotype is real, it should also be present in LysM Cre system for the reasons described above. Therefore, the authors should quantify the KC number and dying KCs in this mouse line as well.

      (5) I am somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. Looking at our own data and those from the Liver Atlas it appears that this gene is primarily expressed in neutrophils. At a minimum, the authors should address the expression of Chil1 in macrophage populations from other publicly available datasets in mouse MASH to validate their findings (several options include - PMID: 33440159, 32888418, 32362324). If expression of Chil1 is not present in these other data sets, perhaps an environmental/microbiome difference may account for the distinct expression pattern observed. Either way, it is important to address this issue.

    3. Reviewer #2 (Public review):

      The manuscript from Shan et al., sets out to investigate the role of Chi3l1 in different hepatic macrophage subsets (KCs and moMFs) in MASLD following their identification that KCs highly express this gene. To this end, they utilise Chi3l1KO, Clec4f-CrexChi3l1fl, and Lyz2-CrexChi3l1fl mice and WT controls fed a HFHC for different periods of time.

      Firstly, the authors perform scRNA-seq, which led to the identification of Chi3l1 (encoded by Chil1) in macrophages. However, this is on a limited number of cells (especially in the HFHC context), and hence it would also be important to validate this finding in other publicly available MASLD/Fibrosis scRNA-seq datasets. Similarly, it would be important to examine if cells other than monocytes/macrophages also express this gene, given the use of the full KO in the manuscript. Along these lines, utilisation of publicly available human MASLD scRNA-seq datasets would also be important to understand where the increased expression observed in patients comes from and the overall relevance of macrophages in this finding.

      Next, the authors use two different Cre lines (Clec4f-Cre and Lyz2-Cre) to target KCs and moMFs respectively. However, no evidence is provided to demonstrate that Chil1 is only deleted from the respective cells in the two CRE lines. Thus, KCs and moMFs should be sorted from both lines, and a qPCR performed to check the deletion of Chil1. This is especially important for the Lyz2-Cre, which has been routinely used in the literature to target KCs (as well as moMFs) and has (at least partial) penetrance in KCs (depending on the gene to be floxed). Also, while the Clec4f-Cre mice show an exacerbated MASLD phenotype, there is currently no baseline phenotype of these animals (or the Lyz2Cre) in steady state in relation to the same readouts provided in MASLD and the macrophage compartment. This is critical to understand if the phenotype is MASLD-specific or if loss of Chi3l1 already affects the macrophages under homeostatic conditions.

      Next, the authors suggest that loss of Chi3l1 promotes KC death. However, to examine this, they use Chi3l1 full KO mice instead of the Clec4f-Cre line. The reason for this is not clear, because in this regard, it is now not clear whether the effects are regulated by loss of Chi3l1 from KCs or from other hepatic cells (see point above). The authors mention that Chi3l1 is a secreted protein, so does this mean other cells are also secreting it, and are these needed for KC death? In that case, this would not explain the phenotype in the CLEC4F-Cre mice. Here, the authors do perform a basic immunophenotyping of the macrophage populations; however, the markers used are outdated, making it difficult to interpret the findings. Instead of F4/80 and CD11b, which do not allow a perfect discrimination of KCs and moMFs, especially in HFHC diet-fed mice, more robust and specific markers of KCs should be used, including CLEC4F, VSIG4, and TIM4.

      Additionally, while the authors report a reduction of KCs in terms of absolute numbers, there are no differences in proportions. This, coupled with a decrease also in moMF numbers at 16 weeks (when one would expect an increase if KCs are decreased, based on previous literature) suggests that the differences in KC numbers may be due to differences in total cell counts obtained from the obese livers compared with controls. To rule this out, total cell counts and total live CD45+ cell counts should be provided. Here, the authors also provide tunnel staining in situ to demonstrate increased KC death, but as it is typically notoriously difficult to visualise dying KCs in MASLD models, here it would be important to provide more images. Similarly, there appear to be many more Tunel+ cells in the KO that are not KCs; thus, it would be important to examine this in the CLEC4F-Cre line to ascertain direct versus indirect effects on cell survival.

      Finally, the authors suggest that Chi3l1 exerts its effects through binding glucose and preventing its uptake. They use ex vivo/in vitro models to assess this with rChi3l1; however, here I miss the key in vivo experiment using the CLEC4F-Cre mice to prove that this in KCs is sufficient for the phenotype. This is critical to confirm the take-home message of the manuscript.

    4. Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

    1. eLife Assessment

      This study provides valuable insights into a new toxin-antidote element in C. elegans, the first naturally occurring unlinked toxin-antidote system where endogenous small RNA pathways post-transcriptionally suppress the toxin. The strength of evidence is solid, using a combination of genomic and experimental methods. Enthusiasm, however, is tempered by its reliance on meta-analysis of existing data sets and limited experimental evaluation.

    2. Reviewer #1 (Public review):

      Summary:

      The article by Zdraljevic et al. reports the discovery of a third toxin-antidote (TA) element in C. elegans, composed of the genes mll-1 (toxin) and smll-1 (antidote). Unlike previously characterized TA systems in C. elegans, this element induces larval arrest rather than embryonic lethality. The study identifies three distinct haplotypes at the TA locus, including a hyper-divergent version in the standard laboratory strain N2, which retains a functional toxin but lacks a functional antidote. The authors propose that small RNA-mediated silencing mechanisms, dependent on MUT-16 and PRG-1, suppress the toxicity of the divergent toxin allele. This work provides insights into the evolutionary dynamics of TA elements and their regulation through RNA interference (RNAi).

      Overall, there are many things to like about this paper and only a few small quibbles, which will not require more than a little rewriting or relatively minor analyses.

      Strengths:

      (1) The discovery of a maternally deposited TA element with delayed toxicity due to delayed mRNA translation of the maternally deposited toxin mRNA is a significant addition to the literature on selfish genetic elements in metazoans.

      (2) Identifying three haplotypes at the TA locus provides a snapshot of potential evolutionary trajectories for these elements, which are often inferred but rarely demonstrated in naturally occurring strains. The genomic analysis of 550 wild isolates contextualizes the findings within natural populations, revealing geographic clustering and evolutionary pressures acting on the TA locus.

      (3) The study employs various techniques, including CRISPR/Cas9 knockouts, FISH, long-read RNA sequencing, and population genomics. The use of inducible systems to confirm toxicity and antidote functionality is particularly robust. This multifaceted approach strengthens the validity of the findings.

      (4) The authors provide compelling evidence that small RNA pathways suppress toxin activity in strains lacking a functional antidote. This highlights an alternative mechanism for neutralizing selfish genetic elements.

      Weaknesses:

      (1) The introduction focuses strongly (for good reason) on bacterial TA systems and then jumps to TA systems in C. elegans. It's unclear why TA systems in other eukaryotes are not discussed.

      (2) Similarly, there is a missed opportunity to discuss an analogy between the suppressor mechanism discovered here and the hairpin RNA suppressors of meiotic drive identified by Eric Lai and colleagues. Discussing these will provide a fuller context of the present study's findings and will not affect their novelty.

      (3) While the evidence for RNAi-mediated suppression is strong, the claim that positive selection drove diversification at piRNA binding sites requires further discussion and clarification. The elevated dN and dS are unusual (how unusual relative to other genes in vicinity? What is hyper-divergent statistically speaking?), but there is no a priori reason that there would be selection on piRNA binding sites within the mll-1 transcript to facilitate its recognition by endogenous RNAi machinery; what is the selective pressure for mll-1 to do so? Most TA systems would like to avoid being suppressed by the host. One cannot make the argument that this was motivated by the loss of the antidote because the loss of the antidote would be instantly suicidal, so the cadence of events described requiring hypermutation of the mll-1 transcript does not work.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript by Walter-McNeill, Kruglyak, and team, the authors provide solid evidence of another toxin-antidote (TA) system in C. elegans. Generally, TA systems involve selfish and linked genetic elements, one encoding a toxin that kills progeny inheriting it, unless an antidote (the second element) is also present. Currently, only two TA systems have been characterized in this species, pointing to the importance of identifying new instances of such systems to understand their transmission dynamics, prevalence, and functions in shaping worm populations.

      Strengths:

      This novel TA system (mll-1/smll-1) was identified on LGV in wild C. elegans isolates from the Hawaiian islands, by crossing divergent strains and observing allele frequency distortions by high-throughput genome sequencing after 10 generations. These allele frequency distortions were subsequently confirmed in another set of crosses with a separate divergent strain, and crosses of heterozygous males or hermaphrodites resulted in a pattern of L1 lethality in progeny (with a rod arrest phenotype) that suggested the maternal transmission of this TA system from the XZ1516 genetic background. By elegantly combining the use of near-isogenic lines, CRISPR editing to generate knock-outs, and a transgene rescue of the antidote gene, the authors identified the genes encoding the toxin and the antidote, which they refer to as mll-1 and smll-1. Moreover, the specific mll-1 isoform responsible for the production of the toxin was identified and mll-1 transcripts were observed by FISH in early and late embryos, as well as in larvae. Inducible expression of the toxin in various strains resulted in larval arrest and rod phenotypes. The authors then characterized the genetic variation of 550 wild isolates at the toxin/antidote region on LGV and distinguished three clades: (1) one with the conserved TA system, (2) one having lost the toxin and retaining a mostly functional antidote, and (3) one having lost the antidote and retaining a divergent yet coding toxin (this includes the reference strain Bristol N2, in which the homologous toxin gene has acquired mutations and is known as B0250.8). Further, the authors show that this region is under positive selection. These data are compelling and provide very strong evidence of a new TA system in this species.

      Weaknesses:

      The question remained as to how one clade, including N2, could retain the toxin gene but not possess a functional antidote. In the second part of the manuscript, the authors hypothesized that small RNA targeting (RNAi) of the toxin transcript could provide the necessary repression to allow worms to survive without the antidote. Through a meta-analysis of multiple small RNA datasets from the literature, the authors found evidence to support this idea, in which the toxin transcript is targeted by 22G siRNAs whose biogenesis is dependent on the Mutator foci protein, MUT-16. They note that from previous studies, mut-16 null mutants displayed a varied penetrance of larval arrest. In their own hands, mut-16 mutants displayed 15% varied larval arrest and 2% rod phenotypes. In an attempt to link B0250.8 to mut-16/siRNAs, they made a double mutant and examined body length as a proxy for developmental stage. Here, they observed a partial rescue of the mut-16 size defect by B0250.8 mutation. Finally, the authors also highlight data from further meta-analysis, which predicts the recognition of B0250.8 by several piRNAs. Also based on existing data from the literature, the authors link loss of Piwi (PRG-1), which binds piRNAs, to a depletion of 22G-RNAs targeting B0250.8 and an upregulation of B0250.8 expression in gonads, suggesting that piRNAs are the primary small RNAs that target B0250.8 for downregulation. The data in this portion of the manuscript are intriguing, but somewhat preliminary and incomplete, as they are based on little primary experimentation and a collection of different datasets (which have been acquired by slightly different methods in most cases). This portion of the study would require subsequent experimentation to firmly establish this mechanistic link. For example, to be able to claim that "the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation" the identified piRNA sites should be mutated and protein and transcript levels analysed in wild-type and in the strain with mutated piRNA sites. At a minimum, the protein levels in wild-type and mut-16, prg-1, and/or wago-1 mutants should be measured by western blot and/or by live imaging (introducing a GFP or some other tag to the endogenous protein via CRISPR editing) to show that the toxin is not accumulated as a protein in wt, but increases in levels in these mutants. mRNA levels in Figure S5A suggest there is still some expression of the B0250.8 transcript in a wild-type situation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies that have put forward the concept of an ’inverse RF’ based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation. Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feedforward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The statistics are pooled across animals, which is less appropriate for hierarchical data. There is no histological confirmation of placement of the electrode in the LGN and there is no analysis of eye or face movements which may have contributed to the surround-induced responses. There are also some missing statistics and methods details which make interpretation more difficult.

      We thank the reviewer for their positive and constructive comments, and have addressed these specific issues in response to the minor comments. For the statistics across animals, we refer to “Reviewer 1 recommendations” point 1. For the histological analysis, we refer to “Reviewer 1 recommendations point 2”. For the eye and facial movements, we refer to “Reviewer 1 recommendations point 5”. Concerning missing statistics and methods details, we refer to various responses to “Reviewer 1 recommendations”. We thoroughly reviewed the manuscript and included all missing statistical and methodological details.

      Reviewer #2 (Public review):

      Cuevas et al. investigate the stimulus selectivity of surround-induced responses in the mouse primary visual cortex (V1). While classical experiments in non-human primates and cats have generally demonstrated that stimuli in the surround receptive field (RF) of V1 neurons only modulate activity to stimuli presented in the center RF, without eliciting responses when presented in isolation, recent studies in mouse V1 have indicated the presence of purely surround-induced responses. These have been linked to prediction error signals. In this study, the authors build on these previous findings by systematically examining the stimulus selectivity of surround-induced responses.

      Using neuropixels recordings in V1 and the dorsal lateral geniculate nucleus (dLGN) of head-fixed, awake mice, the authors presented various stimulus types (gratings, noise, surfaces) to the center and surround, as well as to the surround only, while also varying the size of the stimuli. Their results confirm the existence of surround-induced responses in mouse V1 neurons, demonstrating that these responses do not require spatial or temporal coherence across the surround, as would be expected if they were linked to prediction error signals. Instead, they suggest that surround-induced responses primarily reflect the representation of the achromatic surface itself.

      The literature on center-surround effects in V1 is extensive and sometimes confusing, likely due to the use of different species, stimulus configurations, contrast levels, and stimulus sizes across different studies. It is plausible that surround modulation serves multiple functions depending on these parameters. Within this context, the study by Cuevas et al. makes a significant contribution by exploring the relationship between surround-induced responses in mouse V1 and stimulus statistics. The research is meticulously conducted and incorporates a wide range of experimental stimulus conditions, providing valuable new insights regarding center-surround interactions.

      However, the current manuscript presents challenges in readability for both non-experts and experts. Some conclusions are difficult to follow or not clearly justified.

      I recommend the following improvements to enhance clarity and comprehension:

      (1) Clearly state the hypotheses being tested at the beginning of the manuscript.

      (2) Always specify the species used in referenced studies to avoid confusion (esp. Introduction and Discussion).

      (3) Briefly summarize the main findings at the beginning of each section to provide context.

      (4) Clearly define important terms such as “surface stimulus” and “early vs. late stimulus period” to ensure understanding.

      (5) Provide a rationale for each result section, explaining the significance of the findings.

      (6) Offer a detailed explanation of why the results do not support the prediction error signal hypothesis but instead suggest an encoding of the achromatic surface.

      These adjustments will help make the manuscript more accessible and its conclusions more compelling.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      • We explicitly mentioned the species used in the referenced studies.

      • We provided a clearer rationale for each experiment in the Results section.

      We have also always clearly stated the species that previous studies used, both in the Introduction and Discussion section.

      Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually the opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      The paper is not particularly clear. I came out of it rather confused as to which hypotheses were still standing and which hypotheses were ruled out. There are numerous ways to make it clearer.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      ** Recommendations for the Authors:**

      Reviewer #1 (Recommendations for the Authors):

      (1) Given the data is hierarchical with neurons clustered within 6 mice (how many recording sessions per animal?) I would recommend the use of Linear Mixed Effects models. Simply pooling all neurons increases the risk of false alarms.

      To clarify: We used the standard method for analyzing single-unit recordings, by comparing the responses of a population of single neurons between two different conditions. This means that the responses of each single neuron were measured in the different conditions, and the statistics were therefore based on the pairwise differences computed for each neuron separately. This is a common and standard procedure in systems neuroscience, and was also used in the previous studies on this topic (Keller et al., 2020; Kirchberger et al., 2023). We were not concerned with comparing two groups of animals, for which hierarchical analyses are recommended. To address the reviewer’s concern, we did examine whether differences between baseline and the gray/drift condition, as well as the gray/drift compared to the grating condition, were consistent across sessions, which was indeed the case. These findings are presented in Supplementary Figure 6.

      (2) Line 432: “The study utilized three to eight-month-old mice of both genders”. This is confusing, I assume they mean six mice in total, please restate. What about the LGN recordings, were these done in the same mice? Can the authors please clarify how many animals, how many total units, how many included units, how many recording sessions per animal, and whether the same units were recorded in all experiments?

      We have now clarified the information regarding the animals used in the Methods section.

      • We state that “We included female and male mice (C57BL/6), a total of six animals for V1 recordings between three and eight months old. In two of those animals, we recorded simultaneously from LGN and V1.”

      • We state that“For each animal, we recorded around 2-3 sessions from each hemisphere, and we recorded from both hemispheres.”

      • We noted that the number of neurons was not mentioned for each figure caption. We apologize for this omission. We have now added the number for all of the figures and protocols to the revised manuscript. We note that the same neurons were recorded for the different conditions within each protocol, however because a few sessions were short we recorded more units for the grating protocol. Note that we did not make statistical comparisons between protocols.

      (3) I see no histology for confirmation of placement of the electrode in the LGN, how can they be sure they were recording from the LGN? There is also little description of the LGN experiments in the methods.

      For better clarity, we have included a reconstruction of the electrode track from histological sections of one animal post-experiment (Figure S4). The LGN was targeted via stereotactical surgery, and the visual responses in this area are highly distinct. In addition, we used a flash protocol to identify the early-latency responses typical for the LGN, which is described in the Methods section: “A flash stimulus was employed to confirm the locations of LGN at the beginning of the recording sessions, similar to our previous work in which we recorded from LGN and V1 simultaneously (Schneider et al., 2023). This stimulus consisted of a 100 ms white screen and a 2 s gray screen as the inter-stimulus interval, designed to identify visually responsive areas. The responses of multi-unit activity (MUA) to the flash stimulus were extracted and a CSD analysis was then performed on the MUA, sampling every two channels. The resulting CSD profiles were plotted to identify channels corresponding to the LGN. During LGN recordings, simultaneous recordings were made from V1, revealing visually responsive areas interspersed with non-responsive channels.”

      (4) Many statements are not backed up by statistics, for example, each time the authors report that the response at 90degree sign is higher than baseline (Line 121 amongst other places) there is no test to support this. Also Line 140 (negative correlation), Line 145, Line 180.

      For comparison purposes, we only presented statistical analyses across conditions. However, we have now added information to the figure captions stating that all conditions show values higher than the baseline.

      (5) As far as I can see there is no analysis of eye movements or facial movements. This could be an issue, for example, if the onset of the far surround stimuli induces movements this may lead to spurious activations in V1 that would be interpreted as surround-induced responses.

      To address this point, we have included a supplementary figure analyzing facial movements across different sessions and comparing them between conditions (Supplementary Figure 5). A detailed explanation of this analysis has been added to the Methods section. Overall, we observed no significant differences in face movements between trials with gratings, trials with the gray patch, and trials with the gray screen presented during baseline. Animals exhibited similar face movements across all three conditions, supporting the conclusion that the observed neural firing rate increases for the gray-patch condition are not related to face movements.

      (6) The experiments with the rectangular patch (Figure 3) seem to give a slightly different result as the responses for large sizes (75, 90) don’t appear to be above baseline. This condition is also perceptually the least consistent with a grey surface in the RF, the grey patch doesn’t appear to occlude the surface in this condition. I think this is largely consistent with their conclusions and it could merit some discussion in the results/discussion section.

      While the effect is maybe a bit weaker, the total surround stimulated also covers a smaller area because of the large rectangular gray patch. Furthermore, the early responses are clearly elevated above baseline, and the responses up to 70 degrees are still higher than baseline. Hence we think this data point for 90 degrees does not warrant a strong interpretation.

      Minor points:

      (1) Figure 1h: What is the statistical test reported in the panel (I guess a signed rank based on later figures)? Figure 4d doesn’t appear to be significantly different but is reported as so. Perhaps the median can be indicated on the distribution?

      We explained that we used a signed rank test for Figure 1h and now included the median of the distributions in Figure 4d.

      (2) What was the reason for having the gratings only extend to half the x-axis of the screen, rather than being full-screen? This creates a percept (in humans at least) that is more consistent with the grey patch being a hole in the grating as the grey patch has the same luminance as the background outside the grating.

      We explained in the Methods section that “We presented only half of the x-axis due to the large size of our monitor, in order to avoid over-stimulation of the animals with very large grating stimuli.”. Perceptually speaking, the gray patch appears as something occluding the grating, not as a “hole”.

      (3) Line 103: “and, importantly, had less than 10degree sign (absolute) distance to the grating stimulus’ RF center.” Re-phrase, a stimulus doesn’t have an RF center.

      We corrected this to “We included only single units into the analysis that met several criteria in terms of visual responses (see Methods) and, importantly, the RF center had less than 10(absolute) distance to the grating stimulus’ center. ”.

      (4) Line 143: “We recorded single neurons LGN” - should be “single LGN neurons”.

      We corrected this to “we recorded single LGN neurons”.

      (5) Line 200: They could spell out here that the latency is consistent with the latency observed for the grey patch conditions in the previous experiments. (6) Line 465: This is very brief. What criteria did they use for single-unit assignation? Were all units well-isolated or were multi-units included?

      We clarified in the Methods section that “We isolated single units with Kilosort 2.5 (Steinmetz et al., 2021) and manually curated them with Phy2 (Rossant et al., 2021). We included only single units with a maximum contamination of 10 percent.”

      (7) Line 469: “The experiment was run on a Windows 10”. Typo.

      We corrected this to “The experiment was run on Windows 10”.

      (9) Line 481: “We averaged the response over all trials and positions of the screen”. What do they mean by ’positions of the screen’?

      We changed this to “We computed the response for each position separately right, by averaging the response across all the trials where a square was presented at a given position.”

      (9) Line 483: “We fitted an ellipse in the center of the response”. How?

      We additionally explain how we preferred the detection of the RF using an ellipse fitting: “A heatmap of the response was computed. This heatmap was then smoothed, and we calculated the location of the peak response. From the heatmap we calculated the centroid of the response using the function regionprops.m that finds unique objects, we then selected the biggest area detected. Using the centroids provided as output. We then fitted an ellipse centered on this peak response location to the smoothed heatmap using the MATLAB function ellipse.m.“

      (10) Line 485 “...and positioned the stimulus at the response peak previously found”. Unclear wording, do you mean the center of the ellipse fit to the MUA response averaged across channels or something else? (11) Line 487: “We performed a permutation test of the responses inside the RF detected vs a circle from the same area where the screen was gray for the same trials.”. The wording is a bit unclear here, can they clarify what they mean by the ’same trials’, what is being compared to what here?

      We used a permutation test to compare the neuron’s responses to black and white squares inside the RF to the condition where there was no square in the RF (i.e. the RF was covered by the gray background).

      (12) Was the pink noise background regenerated on each trial or as the same noise pattern shown on each trial?

      We explain that “We randomly presented one of two different pink noise images”

      (13) Line 552: “...used a time window of the Gaussian smoothing kernel from-.05 to .05”. Missing units.

      We explained that “we used a time window of the Gaussian smoothing kernel from -.05 s to .05 s, with a standard deviation of 0.0125 s.”

      (14) Line 565: “Additionally, for the occluded stimulus, we included patch sizes of 70 degree sign and larger.”. Not sure what they’re referring to here.

      We changed this to: “For the population analyses, we analyzed the conditions in which the gray patch sizes were 70 degrees and 90 degrees”.

      (15) Line 569: What is perplexity, and how does changing it affect the t-SNE embeddings?

      Note that t-SNE is only used for visualization purposes. In the revised manuscript, we have expanded our explanation regarding the use of t-SNE and the choice of perplexity values. Specifically, we have clarified that we used a perplexity value of 20 for the Gratings with circular and rectangular occluders and 100 for the black-and-white condition. These values were empirically selected to ensure that the groups in the data were clearly separable while maintaining the balance between local and global relationships in the projected space. This choice allowed us to visually distinguish the different groups while preserving the meaningful structure encoded in the dissimilarity matrices. In particular, varying the perplexity values would not alter the conclusions drawn from the visualization, as t-SNE does not affect the underlying analytical steps of our study.

      (16) Line 572: “We trained a C-Support Vector Classifier based on dissimilarity matrices”. This is overly brief, please describe the construction of the dissimilarity matrices and how the training was implemented. Was this binary, multi-class? What conditions were compared exactly?

      In the revised manuscript, we have expanded our explanation regarding the construction of the dissimilarity matrices and the implementation of the C-Support Vector Classification (C-SVC) model (See Methods section).

      The dissimilarity matrices were calculated using the Euclidean distance between firing rate vectors for all pairs of trials (as shown in Figure 6a-b). These matrices were used directly as input for the classifier. It is important to note that t-SNE was not used for classification but only for visualization purposes. The classifier was binary, distinguishing between two classes (e.g., Dr vs St). We trained the model using 60% of the data for training and used 40% for testing. The C-SVC was implemented using sklearn, and the classification score corresponds to the average accuracy across 20 repetitions.

      Reviewer #2 (Recommendations for the Authors):

      The relationship between the current paper and Keller et al. is challenging to understand. It seems like the study is critiquing the previous study but rather implicitly and not directly. I would suggest either directly stating the criticism or presenting the current study as a follow-up investigation that further explores the observed effect or provides an alternative function. Additionally, defining the inverse RF versus surround-induced responses earlier than in the discussion would be beneficial. Some suggestions:

      (1) The introduction is well-written, but it would be helpful to clearly define the hypotheses regarding the function of surround-induced responses and revisit these hypotheses one by one in the results section.

      Indeed, we have generally improved the Introduction of the manuscript, and stated the hypotheses and their relationships to the Experiments more clearly.

      (2) Explicitly mention how you compare classic grating stimuli of varying sizes with gray patch stimuli. Do the patch stimuli all come with a full-field grating? For the full-field grating, you have one size parameter, while for the patch stimuli, you have two (size of the patch and size of the grating).

      We now clearly describe how we compare grating stimuli of varying sizes with gray patch stimuli.

      (3) The third paragraph in the introduction reads more like a discussion and might be better placed there.

      We have moved content from the third paragraph of the Introduction to the Discussion, where it fits more naturally.

      (4) Include 1-2 sentences explaining how you center RFs and detail the resolution of your method.

      We have added an explanation to the Methods: “To center the visual stimuli during the recording session, we averaged the multiunit activity across the responsive channels and positioned the stimulus at the center of the ellipse fit to the MUA response averaged across channels.”.

      (5) Motivate the use of achromatic stimuli. This section is generally quite hard to understand, so try to simplify it.

      We explained better in the Introduction why we performed this particular experiment.

      (6) The decoding analysis is great, but it is somewhat difficult to understand the most important results. Consider summarizing the key findings at the beginning of this section.

      We now provide a clearer motivation at the start of the Decoding section.

      Reviewer #3 (Recommendations for the Authors):

      I have a few suggestions to improve the clarity of the presentation.

      Abstract: it lists a series of observations and it ends with a conclusion (“based on these findings...”). However, it provides little explanation for how this conclusion would arise from the observations. It would be more helpful to introduce the reasoning at the top and show what is consistent with it.

      We have improved the abstract of the paper incorporating this feedback.

      To some extent, this applies to Results too. Sometimes we are shown the results of some experiment just because others have done a similar experiment. Would it be better to tell us which hypotheses it tests and whether the results are consistent with all 3 hypotheses or might rule one or more out? I came out of the paper rather confused as to which hypotheses were still standing and which hypotheses were ruled out.

      We have strongly improved our explanation of the hypotheses and the relationships to the experiments in the Introduction.

      It would be best if the Results section focused on the results of the study, without much emphasis on what previous studies did or did not measure. Here, instead, in the middle of Results we are told multiple times what Keller et al. (2020) did or did not measure, and what they did or did not find. Please focus on the questions and on the results. Where they agree or disagree with previous papers, tell us briefly that this is the case.

      We have revised the Results section in the revised manuscript, and ensured that there is much less focus on what previous studies did in the Results. Differences to previous work are now discussed in the Discussion section.

      The notation is extremely awkward. For instance “Gc” stands for two words (Gray center) but “Gr” stands for a single word (Grating). The double meaning of G is one of many sources of confusion.

      This notation needs to be revised. Here is one way to make it simpler: choose one word for each type of stimulus (e.g. Gray, White, Black, Drift, Stat, Noise) and use it without abbreviations. To indicate the configuration, combine two of those words (e.g. Gray/Drift for Gray in the center and Drift in the surround).

      We have corrected the notation in the figures and text to enhance readability and improve the reader’s understanding.

      Figure 1e and many subsequent ones: it is not clear why the firing rate is shown in a logarithmic scale. Why not show it in a linear scale? Anyway, if the logarithmic scale is preferred for some reason, then please give us ticks at numbers that we can interpret, like 0.1,1,10,100... or 0.5,1,2,4... Also, please use the same y-scale across figures so we can compare.

      To clarify: it is necessary to normalize the firing rates relative to baseline, in order to pool across neurons. However such a divisive normalization would be by itself problematic, as e.g. a change from 1 to 2 is the same as a change from 1 to 0.5, on a linear scale. Furthermore such division is highly outlier sensitive. For this reason taking the logarithm (base 10) of the ratio is an appropriate transformation. We changed the tick labels to 1, 2, 4 like the reviewer suggested.

      Figure 3: it is not clear what “size” refers to in the stimuli where there is no gray center. Is it the horizontal size of the overall stimulus? Some cartoons might help. Or just some words to explain.

      Figure 3: if my understanding of “size” above is correct, the results are remarkable: there is no effect whatsoever of replacing the center stimulus with a gray rectangle. Shouldn’t this be remarked upon?

      We have added a paragraph under figure 3 and in the Methods section explaining that the sizes represent the varying horizontal dimensions of the rectangular patch. In this protocol, the classical condition (i.e. without gray patch) was shown only as full-field gratings, which is depicted in the plot as size 0, indicating no rectangular patch was present.

      DETAILS The word “achromatic” appears many times in the paper and is essentially uninformative (all stimuli in this study are achromatic, including the gratings). It could be removed in most places except a few, where it is actually used to mean “uniform”. In those cases, it should be replaced by “uniform”.

      Ditto for the word “luminous”, which appears twice and has no apparent meaning. Please replace it with “uniform”.

      We have replaced the words achromatic and luminous with “uniform” stimuli to improve the clarity when we refer to only black or white stimuli.

      Page 3, line 70: “We raise some important factors to consider when describing responses to only surround stimulation.” This sentence might belong in the Discussion but not in the middle of a paragraph of Results.

      We removed this sentence.

      Neuropixel - Neuropixels (plural)

      “area LGN” - LGN

      We corrected for misspellings.

      References

      Keller, A.J., Roth, M.M., Scanziani, M., 2020. Feedback generates a second receptive field in neurons of the visual cortex. Nature 582, 545–549. doi:10.1038/s41586-020-2319-4.

      Kirchberger, L., Mukherjee, S., Self, M.W., Roelfsema, P.R., 2023. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science Advances 9, eadd2498. doi:10. 1126/sciadv.add2498.

      Rossant, C., et al., 2021. phy: Interactive analysis of large-scale electrophysiological data. https://github.com/cortex-lab/phy.

      Schneider, M., Tzanou, A., Uran, C., Vinck, M., 2023. Cell-type-specific propagation of visual flicker. Cell Reports 42.

      Steinmetz, N.A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M., Beau, M., Bhagat, J., B¨ohm, C., Broux, M., Chen, S., Colonell, J., Gardner, R.J., Karsh, B., Kloosterman, F., Kostadinov, D., Mora-Lopez, C., O’Callaghan, J., Park, J., Putzeys, J., Sauerbrei, B., van Daal,R.J.J., Vollan, A.Z., Wang, S., Welkenhuysen, M., Ye, Z., Dudman, J.T., Dutta, B., Hantman, A.W., Harris, K.D., Lee, A.K., Moser, E.I., O’Keefe, J., Renart, A., Svoboda, K., H¨ausser, M., Haesler, S., Carandini, M., Harris, T.D., 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588. doi:10.1126/science.abf4588.

    1. eLife Assessment

      This important study uses long-term behavioural observations to understand the factors that influence female-on-female aggression in gorilla social groups. The evidence supporting the claims is convincing, as it includes novel methods of assessing aggression and considers other potential factors. The work will be of interest to broad biologists working on the social interactions of animals.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

    4. Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically: - ATA: 6 years - BIT: 8 years - KYA: 18 years - MUK: 6 years - ORU: 8 years I recommend that the authors clarify how the 25-year duration was derived.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Thank you for the positive assessment of our work and the nice summary of the manuscript!

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

      We agree that experimental manipulation would allow us to extend our work. Unfortunately, this is not possible with wild, endangered gorillas.

      We have now added more references (Watts 1994; Watts 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influence the likelihood to receive aggression.

      We have now clearly stated that reproductive state is an indirect proxy for energetic needs. We agree with your point about energy intake and expenditure, but unfortunately, we do not have data on energy expenditure or caloric intake to allow us to delve into more fine-grained analyses.

      Overall, we have tried to enrich the justification and empirical support to strengthen our conclusions by clarifying the text and adding more examples and references.

      Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      We did not use group size as a main predictor, as has been commonly done in other species, because of potentially conflating opposing effects of males and females. To further stress this point, we have specifically added in the introduction: “group size, the overall number of individuals in the group, might not be a good predictor of aggression heuristics, as it can conflate the effects of different kinds of individuals on aggression (see Smit & Robbins 2024 for an example of opposing effects of the number of females and number of males on female gorilla aggression).”

      We also “ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, [and] its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      We rephrased accordingly: “We used all avoidance and displacement interactions throughout the study period and we used the function elo.seq from R package EloRating to infer daily individual female Elo-scores”. We also clarified that “This method takes into account the temporal sequence of interactions and updates an individual’s Elo-scores each day the individual interacted with another...”

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      We have shown in Reference 25 (Smit & Robbins 2025) after Reference 11 (Smit & Robbins 2024) that females form highly stable hierarchies, and that dyadic dominance relationships are not influenced by dispersal or death of third individuals. Notably, new immigrant females usually start at and remain low ranking, without large fluctuations in rank. Therefore, the presence of any fluctuation periods have limited influence in the aggressive interactions in our study system.

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      In fact, the females involved in the present analysis and the analysis of Smit & Robbins 2025 are the same. Our present analysis is based on the hierarchies of Smit & Robbins 2025. Note that female gorillas disperse and occasionally immigrate to another study group. This is why some females may appear in the hierarchies of more than one group, giving the impression that there are more females involved in the analysis of Smit & Robbins 2025 (e.g. by counting the lines in the Elo-rating plots). We now specifically state that “We present these interactions and hierarchies in detail in Smit & Robbins 2025”, to clarify that the hierarchies are the same.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

      Thank you for pointing this out. First, when we considered one pregnancy stage, pregnant females showed indeed a significantly greater interaction score than females in other reproductive stages. We have now included that in the manuscript. However, we still find relevant to test for the different stages of pregnancy, given the difference of energetic needs in these stages. We have now included the pairwise comparisons in a new table (Table 2).

      Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically:

      • ATA: 6 years

      • BIT: 8 years

      • KYA: 18 years

      • MUK: 6 years

      • ORU: 8 years

      I recommend that the authors clarify how the 25-year duration was derived.

      Indeed none of the five study “groups” has been studied for 25 years in a row. However, MUK emerged from a fission of group KYA in early 2016. So, from the start of group KYA in October 1998 to the end of group MUK in December 2023, there are 25 years and 2 months. We have now rephrased to “...starting in 1998 in one of the mountain gorilla groups” in the introduction, and to “We use a long-term behavioural dataset on five wild groups of the two gorilla species, starting in 1998” in the abstract.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      We have now added the suggested extra test: “When we ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      Regarding species differences: In our analysis, we test for species (mountain vs western) and we find no significant differences between the two. This is stated in the results.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      We now added short explanations into brackets for behaviours that are not obvious. We also added a sentence in the text to clarify the difference with the behaviours used to calculate Elo scores: “These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

      The sentence we added above (“These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”) and the first paragraph of the results hopefully clarify that ritualized agonistic interactions are generally directionally consistent and more reliably capture the highly stable dominance relationships of female gorillas. This approach has been used to calculate dominance rank in gorillas in all studies that have considered it, dating back to the 1970s (namely in studies by Harcourt and Watts). On the other hand, aggression can be context dependent (we now clearly note that in the beginning of the Methods paragraph on aggressive interactions). Therefore, we use Eloscores inferred from ritualized interactions as base and a reliable proxy of power relationships; then we test if the direction of aggression within these relationships is driven also by energetic needs or the social environment.

    1. eLife Assessment

      This important work by Malita et al. describes a mechanism by which an intestinal infection causes an increase in daytime sleep through signaling from the gut to the blood-brain barrier. Their findings suggest that cytokines upd3 and upd2 produced by the intestine following infection act on glia of the blood brain barrier to regulate sleep by modulating Allatostatin A signaling. The evidence is compelling and elegantly performed using the ample Drosophila genetic toolbox, making this work appealing for a broad group of neuroscience researchers interested in sleep and gut-brain interactions.

    2. Joint Public Review:

      Summary:

      Malita and colleagues investigated the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting the Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation. The evidence supporting most of their claims is compelling. Nevertheless, the activation of the sleep-promoting pathway by infection should be accomplished through bacterial infection of the gut.

      Strengths:

      The work is, in general, supported by well-designed and well-performed experiments, especially those that show that the endocrine cells from the gut are the sources of the Upd cytokines, the effects of these cytokines on daytime sleep, and that the glial cells of the BBB are the target cell for the Upds action. In addition, the evidence associating the downregulation of Alst receptors in the BBB by Upd and Jak/Stat pathways is compelling.

      Weaknesses:

      (1) The model of gut inflammation that is used is based on the increase in reactive oxygen species (ROS) that is caused by adding 1% H2O2 to the food. The use of the model is supported rather weakly by two papers (ref. 26 and 27 ). The paper by Jiang et al. (26) shows that the infection by Pseudomonas entomophila induces cytokine responses Upd2 and 3, which are also induced by the Jnk pathway; there is no mention of ROS. Buchon et al. (27) is a review that refers to results that indicate that as part of the immune response to pathogens in the gut, there is production of ROS by the NADPH oxidase DUOX. Thus, there is no strong support for the use of this model.

      (2) There is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. It is known that ROS causes damage in the gut epithelium, which in turn induces the expression of the cytokines studied, which might be independent of infection and confound the results.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.

      The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      We are pleased that the reviewers recognized the strength and significance of our findings describing a gut-to-brain cytokine signaling mechanism involving the blood-brain barrier (BBB) and its role in regulating sleep, and we thank them for their comments.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      We thank the reviewer for raising this point. Although several midgut cell types (including the absorptive enterocytes) may indeed produce Unpaired (Upd) cytokines, our study specifically focused on enteroendocrine cells (EECs), which are well-characterized as secretory endocrine cells capable of exerting systemic effects. As detailed in our response to Results point #2 (please see below), we show that EEC-specific manipulation of Upd signaling is both necessary and sufficient to regulate sleep in response to intestinal oxidative stress. These findings support the role of EECs as a primary source of gut-derived cytokine signaling to the brain. To acknowledge the possible involvement of other source, we have also added a statement to the Discussion in the revised manuscript noting that other, non-endocrine gut cell types may contribute to systemic Unpaired signaling that modulates sleep.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      We agree with the reviewer that this is an important point. To address it, we performed additional validation experiments to assess whether the voilà-GAL4 driver in combination with R57C10-GAL80 (EEC>) influences upd2 or upd3 expression in the brain. Our results show that manipulation using EEC> alters upd2 and upd3 expression in the gut (Fig. 1a,b), with new data showing that this does not affect their expression levels in neuronal tissues (Fig. S1a), supporting the specificity of our approach. These new data are now included in the revised manuscript and described in the Results section. This additional validation strengthens our conclusion that the observed sleep phenotypes result from gut-specific cytokine signaling, rather than from effects on Unpaired cytokines produced in the brain.

      (1) >(3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      We thank the reviewer for raising this point. We agree that the references originally cited did not sufficiently justify the use of H<sub>2</sub>O<sub>2</sub> feeding as a model of gut inflammation. To address this, we have revised the Results section to clarify that we use H<sub>2</sub>O<sub>2</sub> feeding as a controlled method to elevate intestinal ROS levels, rather than as a general model of inflammation. This approach allows us to investigate the specific effects of ROS-induced cytokine signaling in the gut. We have also added additional citations to support the physiological relevance of this model. For instance, Tamamouna et al. (2021) demonstrated that H<sub>2</sub>O<sub>2</sub> feeding induces intestinal stem-cell proliferation – a response also observed during bacterial infection – and Jiang et al. (2009) showed that enteric infections increase upd2 and upd3 expression, which we similarly observe following H<sub>2</sub>O<sub>2</sub> feeding (Fig. 3a). These findings support the use of H<sub>2</sub>O<sub>2</sub> as a tool to mimic specific ROS-linked responses in the gut. We believe this targeted and tractable model is a strength of our study, enabling us to dissect how intestinal ROS modulates systemic physiology through cytokine signaling

      Additionally, we have included a statement in the Discussion acknowledging that ROS generated during infection may activate signaling mechanisms distinct from those triggered by chemically induced oxidative stress, and that exploring these differences in future studies may yield important insights into gut–brain communication. These revisions provide a stronger justification for our model while more accurately conveying both its relevance and its limitations.

      (2) >(4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-02341540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      We thank the reviewer for this thoughtful follow-up point. We would like to clarify that we do not claim that the effects observed in our study directly reflect the full response to enteric infection. As outlined in our revised response to comment 3, we have updated the manuscript to more precisely describe the H<sub>2</sub>O<sub>2</sub>-feeding paradigm as a model that induces local intestinal ROS responses comparable to, but not equivalent to, those observed during pathogenic challenges. This revised framing highlights both the potential similarities and differences between chemically induced oxidative stress and infection-induced responses. Indeed, in the revised Discussion, we now explicitly acknowledge that ROS generated during infection may engage distinct signaling mechanisms compared to exogenous H<sub>2</sub>O<sub>2</sub> and emphasize the value of future studies in delineating these pathways. We are currently pursuing this direction in an independent ongoing study investigating the effects of enteric infections. However, for the present work, we chose to focus on the effects of ROS-induced responses in isolation, as this provides a clean and well-controlled context to dissect the specific contribution of oxidative stress to cytokine signaling and sleep regulation.

      To further address the reviewer’s concern, we have also included new data (a TUNEL stain for apoptotic DNA fragmentation) in the revised manuscript showing that H<sub>2</sub>O<sub>2</sub> feeding does not damage neuronal tissues under our experimental conditions (Fig. S3f,g). This addresses the point raised regarding the potential neurotoxicity of H<sub>2</sub>O<sub>2</sub>, as described by Majcin Dorcikova et al. (2023), and supports the specificity of the sleep phenotypes observed in our study. We believe these revisions and clarifications strengthen the manuscript and make our interpretation more precise.

      (3) >(5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

      Our work highlights a distinct role for gut-derived AstA in sleep regulation compared to findings by Lin et al. (Cell Discovery, 2023)[1], who showed that gut AstA mediates energy wasting during sleep deprivation. Their study focused on the metabolic consequences of sleep loss, proposing that sleep deprivation increases ROS in the gut, which then promotes the release of the glucagon-like hormone adipokinetic hormone (AKH) through gut AstA signaling, thereby triggering energy expenditure.

      In contrast, our study addresses the inverse question – how ROS in the gut influences sleep. In our model, intestinal ROS promotes sleep, raising the intriguing possibility – cleverly pointed out by the reviewers – that ROS generated during sleep deprivation might promote sleep by inducing Unpaired cytokine signaling in the gut. According to our findings, this suppresses wake-promoting AstA signaling in the BBB, providing a mechanism to promote sleep as a restorative response to gut-derived oxidative stress and potentially limiting further ROS accumulation. Importantly, our findings support a wakepromoting role for EEC-derived AstA, demonstrated by several lines of evidence. First, EEC-specific knockdown of AstA increases sleep. Second, activation of AstA<sup>+</sup> EECs using the heat-sensitive cation channel Transient Receptor Potential A1 (TrpA1) reduces sleep, and this effect is abolished by simultaneous knockdown of AstA, indicating that the sleep-suppressing effect is mediated by AstA and not by other peptides or secreted factors released by these cells. Third, downregulation of AstA receptor expression in BBB glial cells increases sleep, further supporting the existence of a functional gut AstA– glia arousal pathway. We have now included new data in the revised manuscript showing that AstA release from EECs is downregulated during intestinal oxidative stress (Fig. 7k,l,m). This suggests that this wake-promoting signal is suppressed both at its source (the gut endocrine cells), by unknown means, and at its target, the BBB, via Unpaired cytokine signaling that downregulates AstA receptor expression. This coordinated downregulation may serve to efficiently silence this arousal-promoting pathway and facilitate sleep during intestinal stress. These new data, along with an expanded discussion, provide further mechanistic insight into gut-derived AstA signaling and strengthen our proposed model.

      This contrasts with the interpretation by Lin et al., who observed increased AstA peptide levels in EECs after antioxidant treatment and interpreted this as peptide retention. However, peptide accumulation may result from either increased production or decreased release, and peptide levels alone are insufficient to distinguish between these possibilities. To resolve this, we examined AstA transcript levels, which can serve as a proxy for production. Following oxidative stress (24 h of 1% H<sub>2</sub>O<sub>2</sub> feeding and the following day), when animals show increased sleep (Fig. 7e), we observed a decrease in AstA transcript levels followed by an increase in peptide levels (Fig. 7k,l,m), suggesting that oxidative stress leads to reduced gut AstA production and release. Furthermore, we recently found that a class of EECs that produce the hormone Tachykinin (Tk) and are distinct from the AstA<sup>+</sup> EECs express the ROSsensitive cation channel TrpA1 (Ahrentløv et al., 2025, Nature Metabolism2). In these Tk<sup>+</sup> EECs, TrpA1 mediates ROS-induced Tk hormone release. In contrast, single-cell RNA-seq data[3] do not support TrpA1 expression in AstA<sup>+</sup> EECs, consistent with our findings that ROS does not promote AstA release – an effect that would be expected if TrpA1 were functionally expressed in AstA<sup>+</sup> EECs. This contradicts the findings of Lin et al., who reported TrpA1 expression in AstA<sup>+</sup> EECs. We have now included relevant single-cell data in the revised manuscript (Fig. S6f) showing that TrpA1 is specifically expressed in Tk<sup>+</sup> EECs, but not in AstA<sup>+</sup> EECs, and we have expanded the discussion to address discrepancies in TrpA1 expression and AstA regulation.

      Taken together, our results reveal a dual-site regulatory mechanism in which Unpaired cytokines released from the gut act at the BBB to downregulate AstA receptor expression, while AstA release from EECs is simultaneously suppressed. We thank the reviewers for raising this important point. We have also included a discussion the other point raised by the reviewers – the possibility that ROS generated during sleep deprivation may engage the same signaling pathways described here, providing a mechanistic link between sleep deprivation, intestinal stress, and sleep regulation.

      Recommendations for the authors:

      A- Material and Methods:

      (1) Feeding Assay: The cited publication (doi.org:10.1371/journal.pone.0006063) states: "For the amount of label in the fly to reflect feeding, measurements must therefore be confined to the time period before label egestion commences, about 40 minutes in Drosophila, a time period during which disturbance of the flies affects their feeding behavior. There is thus a requirement for a method of measuring feeding in undisturbed conditions." Was blue fecal matter already present on the tube when flies were homogenized at 1 hour? If so, the assay may reflect gut capacity rather than food passage (as a proxy for food intake). In addition, was the variability of food intake among flies in the same tube tested (to make sure that 1-2 flies are a good proxy for the whole population)?

      We agree that this is an important point for feeding experiments. We are aware of the methodological considerations highlighted in the cited study and have extensive experience using a range of feeding assays in Drosophila, including both short- and long-term consumption assays (e.g., dye-based and CAFE assays), as well as automated platforms such as FLIC and FlyPAD (Nature Communications, 2022; Nature Metabolism, 2022; and Nature Metabolism, 2025)[2,4,5].

      For the dye-based assay, we carefully selected a 1-hour feeding window based on prior optimization. Since animals were not starved prior to the assay, shorter time points (e.g., 30 minutes) typically result in insufficient ingestion for reliable quantification. A 1-hour period provides a robust readout while remaining within the timeframe before significant label excretion occurs under our experimental conditions. To support the robustness of our findings, we complemented the dye-based assay with data from FLIC, which enables automated, high-resolution monitoring of feeding behavior in undisturbed animals over extended periods. The FLIC results were consistent with the dye-based data, strengthening our confidence in the conclusions. To minimize variability and ensure consistency across experiments, all feeding assays were performed at the same circadian time – Zeitgeber Time 0 (ZT0), corresponding to 10:00 AM when lights are turned on in our incubators. This time point coincides with the animals' natural morning feeding peak, allowing for reproducible comparisons across conditions. Regarding variability among flies within tubes, each biological replicate in the dye assay consisted of 1–2 flies, and results were averaged across multiple replicates. We observed good consistency across samples, suggesting that these small groups reliably reflect group-level feeding behavior under our conditions.

      (2) Biological replicates: whereas the number of samples is clearly reported in each figure, the number of biological replicates is not indicated. Please include this information either in Material and methods or in the relevant figure legends. Please also include a description of what was considered a biological replicate.

      We have now clarified in the Materials and Methods section under Statistics that all replicates represent independent biological samples, as suggested by the reviewers.

      (3) Control Lines: please indicate which control lines were used instead of citing another publication. If preferred, this information could be supplied as a supplementary table.

      We now provide a clear description of the control lines used in the Materials and Methods section. Specifically, all GAL4 and GAL80 lines used in this study were backcrossed for several generations into a shared w<sup>1118</sup> background and then crossed to the same w<sup>1118</sup> strain used as the genetic background for the UAS-RNAi, <i.CRISPR, or overexpression lines. This approach ensures, to a strong approximation, that the only difference between control and experimental animals is the presence or absence of the UAS transgene.

      (4) Statistical analyses: for some results (e.g., those shown in Figure 3d), it could be useful to test the interaction between genotype and treatment.

      We thank the reviewer for this helpful suggestion. In response, we have now performed two-way ANOVA analyses to assess genotype × treatment (diet) interaction effects for the relevant data, including those shown in Figure 3d as well as additional panels where animals were exposed to oxidative stress and sleep phenotypes were measured. We have added the corresponding interaction p-values in the updated figure legends for Figures 3d, 3k, 5a–c, 5f, 5h, 5i, 6c, 6e, and 7e. All of these tests revealed significant interaction effects, supporting the conclusion that the observed differences in sleep phenotypes are specifically dependent on the interaction between genetic manipulation (e.g., cytokine or receptor knockdown) and oxidative stress. These additions reinforce the interpretation that Unpaired cytokine signaling, glial JAK-STAT pathway activity, and AstA receptor regulation functionally interact with intestinal ROS exposure to modulate sleep. We thank the reviewer for suggesting this improvement.

      (5) Reporting of p values. Some are reported as specific values whereas others are reported as less than a specific value. Please make this reporting consistent across different figures.

      All p-values reported in the manuscript are exact, except in cases where values fall below p < 0.0001. In those instances, we use the inequality because the Prism software package (GraphPad, version 10), which was used for all statistical analyses, does not report more precise values. We believe this reporting approach reflects standard practice in the field.

      (6) Please include the color code used in each figure, either in the figure itself or in the legend.

      We have now clarified the color coding in all relevant figures. In particular, we acknowledge that the meaning of the half-colored circles used to indicate H<sub>2</sub>O<sub>2</sub> treatment was not previously explained. These have now been clearly labeled in each figure to indicate treatment conditions.

      (7) The scheme describing the experimental conditions and the associated chart is confusing. Please improve.

      We have improved the schematic by replacing “ROS” with “H<sub>2</sub>O<sub>2</sub>” to more clearly indicate the experimental condition used. Additionally, we have added the corresponding circle annotations so that they now also appear consistently above the relevant charts. This revised layout enhances clarity and helps readers more easily interpret the experimental conditions. We believe these changes address the reviewer’s concern and make the figure significantly more intuitive.

      8) Please indicate which line was used for upd-Gal4 and the evidence that it faithfully reflects upd3 expression.

      We have now clarified in the Materials and Methods section that the upd3-GAL4 line used in our study is Bloomington stock #98420, which drives GAL4 expression under the control of approximately 2 kb of sequence upstream of the upd3 start codon. This line has previously been used as a transcriptional reporter for upd3 activity. The only use of this line was to illustrate reporter expression in the EECs. To support this aspect of Upd3 expression, we now include new data in the revised manuscript using fluorescent in situ hybridization (FISH) against upd3, which confirms the presence of upd3 transcripts in prospero-positive EECs of the adult midgut (Fig. S1b). Additionally, we show that upd3 transcript levels are significantly reduced in dissected midguts following EEC-specific knockdown using multiple independent RNAi lines driven by voilà-GAL4, both alone and in combination with R57C10-GAL80, consistent with endogenous expression in these cells (Fig. 1a,b).

      To further address the reviewer’s concern and provide additional support for the endogenous expression of upd3 in EECs, we performed targeted knockdown experiments focusing on molecularly defined EEC subpopulations. The adult Drosophila midgut contains two major EEC subtypes characterized by their expression of Allatostatin C (AstC) or Tachykinin (Tk), which together encompass the vast majority of EECs. To selectively manipulate these populations, we used AstC-GAL4 and Tk-GAL4 drivers – both knock-in lines in which GAL4 is inserted at the respective endogenous hormone loci. This design enables precise GAL4 expression in AstC- or Tk-expressing EECs based on their native transcriptional profile. To eliminate confounding neuronal expression, we combined these drivers with R57C10GAL80, restricting GAL4 activity to the gut and generating AstC<sup>Gut</sup>> and Tk<sup>Gut</sup>> drivers. Using these tools, we knocked down upd2 and upd3 selectively in the AstC- or Tk-positive EECs. Knockdown of either cytokine in AstC-positive EECs significantly increased sleep under homeostatic conditions, recapitulating the phenotype observed with knockdown in all EECs (Fig. 1m-o). In contrast, knockdown of upd2 or upd3 in Tk-positive EECs had no effect on sleep (Fig. 1p-r). Furthermore, we show in the revised manuscript that selective knockdown of upd2 or upd3 in AstC-positive EECs abolishes the H<sub>2</sub>O<sub>2</sub>-induced increase in sleep (Fig. 3f–h). These findings demonstrate that Unpaired cytokine signaling from AstC-positive EECs is essential for mediating the sleep response to intestinal oxidative stress, highlighting this specific EEC subtype as a key source of cytokine-driven regulation in this context. These new results indicate that AstC-positive EECs are a primary source of the Unpaired cytokines that regulate sleep, while Tk-positive EECs do not appear to contribute to this function. Importantly, upd3 transcript levels were significantly reduced in dissected midguts following AstC<sup>Gut</sup> driven knockdown (Fig. S1r), further confirming that upd3 is endogenously expressed in AstC-positive EECs. Thus we have bolstered our confidence that upd3 is indeed expressed in EECs, as illustrated by the reporter line, through several means.

      (9) Please indicate which GFP line was used with upd-Gal4 (CD8, NLS, un-tagged, etc). The Material and Methods section states that it was "UAS-mCD8::GFP (#5137);", however, the stain does not seem to match a cell membrane pattern but rather a nuclear or cytoplasmic pattern. This information would help the interpretation of Figure 1C.

      We confirm that the GFP reporter line used with upd3-GAL4 was obtained from Bloomington stock #98420. As noted by the Bloomington Drosophila Stock Center, “the identity of the UAS-GFP transgene is a guess,” and the subcellular localization of the GFP fusion is therefore uncertain. We agree with the reviewer that the signal observed in Figure 1c does not display clear membrane localization and instead appears diffuse, consistent with cytoplasmic or partially nuclear localization. In any case, what we find most salient is the reporter’s labeling of Prospero-positive EECs in the adult midgut, consistent with upd3 expression in these cells. This conclusion is further supported by multiple lines of evidence presented in the revised manuscript, as mentioned above in response to question #8: (1) fluorescent in situ hybridization (FISH) for upd3 confirms expression in EECs (Fig. S1b), (2) EEC-specific RNAi knockdown of upd3 reduces transcript levels in dissected midguts, and (3) publicly available single-cell RNA sequencing datasets[3] also indicate that upd3 is expressed at low levels in a subset of adult midgut EECs under normal conditions. We have also clarified in the revised Materials and Methods section that GFP localization is undefined in the upd3-GAL4 line, to guide interpretation of the reporter signal.

      B- Results

      (1) Figure 1: According to previous work (10.1016/j.celrep.2015.06.009, http://flygutseq.buchonlab.com/data?gene=upd3%0D%0A), in basal conditions upd3 is expressed as following: ISC (35 RPKM), EB (98 RPKM), EC (57 RPKM), and EEC (8 RPKM). Accordingly, even complete KO in EECs should eliminate only a small fraction of upd3 from whole guts, even less considering the greater abundance of other cell types such as ECs compared to EECs. It would be useful to understand where this discrepancy comes from, in case it is affecting the conclusion of the manuscript. While this point per se does not affect the main conclusions of the manuscript, it makes the interpretation of the results more difficult.

      We acknowledge the previously reported low expression of upd3 in EECs. However, the FlyGut-seq site appears to be no longer available, so we could not directly compare other related genes. Nonetheless, our data – based on in situ hybridization, reporter expression, and multiple RNAi knockdowns – consistently support upd3 expression in EECs. These complementary approaches strengthen the conclusion that EECs are an important source of systemic upd3 under the conditions tested.

      (2) Figure 1: The upd2-3 mutants show sleep defects very similar to those of EEC>RNAi and >Cas9. It would thus be helpful to try to KO upd3 with other midgut drivers (An EC driver like Myo1A or 5966GS and a progenitor driver like Esg or 5961GS) to validate these results. Such experiments might identify precisely which cells are involved in the gut-brain signaling reported here.

      We appreciate the reviewer’s suggestion and agree that exploring other potential sources of Upd3 in the gut is an interesting direction. In this study, we have focused on EECs, which are the primary hormone-secreting cells in the intestine and thus the most likely candidates for mediating systemic effects such as gut-to-brain signaling. While it is possible that other gut cell types – such as enterocytes (e.g., Myo1A<sup>+</sup>) or intestinal progenitors (e.g., Esg<sup>+</sup>) – also contribute to Upd3 production, these cells are not typically endocrine in nature. Demonstrating their involvement in gutto-brain communication would therefore require additional, extensive validation beyond the scope of the current study. Importantly, our data show that manipulating Upd3 specifically in EECs is both necessary and sufficient to modulate sleep in response to intestinal ROS, strongly supporting the conclusion that EEC-derived cytokine signaling underlies the observed phenotype. In contrast, manipulating cytokines in other gut cells could produce indirect effects – such as altered proliferation, epithelial integrity, or immune responses – that complicate the interpretation of behavioral outcomes like sleep. For these reasons, we chose to focus on EECs as the source of endocrine signals mediating gut-to-brain communication. However, to address this point raised by the reviewer, we have now included a statement in the Discussion acknowledging that other non-endocrine gut cell types may also contribute to the systemic Unpaired signaling that modulates sleep in response to intestinal oxidative stress.

      (3) Figure 3: "This effect mirrored the upregulation observed with EEC-specific overexpression of upd3, indicating that it reflects physiologically relevant production of upd3 by the gut in response to oxidative stress." Please add (Figure 3a) at the end of this sentence.

      We have now added “(Figure 3a)” at the end of the sentence to clearly reference the relevant data.

      (4) For Figure 3b, do you have data showing that the increased amount of sleep was due to the addition of H2O2 per se, rather than the procedure of adding it?

      We have added new data to address this point. To ensure that the observed sleep increase was specifically due to the presence of H<sub>2</sub>O<sub>2</sub> and not an effect of the food replacement procedure, we performed a control experiment in which animals were fed standard food prepared using the same protocol and replaced daily, but without H<sub>2</sub>O<sub>2</sub>. These animals did not exhibit increased sleep, confirming that the sleep effect is attributable to intestinal ROS rather than the supplementation procedure itself (Fig. S3a). Thanks for the suggestion.

      (5) In the text it is stated that "Since 1% H2O2 feeding induced robust responses both in upd3 expression and in sleep behavior, we asked whether gut-derived Unpaired signaling might be essential for the observed ROS-induced sleep modulation. Indeed, EEC-specific RNAi targeting upd2 or upd3 abolished the sleep response to 1% H2O2 feeding." While it is indeed true that there is no additional increase in sleep time due to EEC>upd3 RNAi, it is also true that EEC>upd3 RNAi flies, without any treatment, have already increased their sleep in the first place. It is then possible that rather than unpaired signaling being essential, an upper threshold for maximum sleep allowed by manipulation of these processes was reached. It would be useful to discuss this point.

      Several findings argue against a ceiling effect and instead support a requirement for Unpaired signaling in mediating ROS-induced sleep. Animals with EEC-specific upd2 or upd3 knockdown or null mutation not only fail to increase sleep following H<sub>2</sub>O<sub>2</sub> treatment but actually exhibit reduced sleep during oxidative stress (Fig. 3e, k, l; Fig. 5e, f), suggesting that Unpaired signaling is required to sustain sleep under these conditions. Similarly, animals with glial dome knockdown also show reduced sleep under oxidative stress, closely mirroring the phenotype of EEC-specific upd3 RNAi animals (Fig. 5a–c, g–i). These results support the conclusion that gut-to-glia Unpaired cytokine signaling is necessary for maintaining elevated sleep during oxidative stress. In the absence of this signaling, animals exhibit increased wakefulness. We identify AstA as one such wake-promoting signal that is suppressed during intestinal stress. We present new data showing that this pathway is downregulated not only via Unpaired-JAK/STAT signaling in glial cells but also through reduced AstA release from the gut in the revised manuscript. This model, in which Unpaired cytokines promote sleep during intestinal stress by suppressing arousal pathways, is discussed throughout the manuscript to address the reviewer’s point.

      (6) In Figure 3k, the dots highlighting the experiment show an empty profile, a full one, and a half one. Please define what the half dots represent.

      We have now clarified the color coding in all relevant figures. Specifically, we acknowledge that the meaning of the half-colored circles indicating H<sub>2</sub>O<sub>2</sub> treatment was not previously defined – it indicates washout or recovery time. In the revised version, these symbols are now clearly labeled in each figure to indicate the treatment condition, ensuring consistent and intuitive interpretation across all panels.

      (7) The authors used appropriate GAL4 and RNAi lines to the knockdown dome, a upd2/3 JAK-STATlinked receptor, specifically in neurons and glia, respectively, in order to identify the CNS targets of upd2/3 cytokines produced by enteroendocrine cells (EECs). Pan-neuronal dome knockdown did not alter daytime sleep in adult females, yet pan-glial dome knockdown phenocopied effects of upd2/3 knockdown in EECs. They also observed that EEC-specific knockdown of upd2 and upd3 led to a decrease in JAK-STAT reporter activity in repo-positive glial cells. This supports the authors' conclusion that glial cells, not neurons, are the targets by which unpaired cytokines regulate sleep via JAK-STAT signaling. However, they do not show nighttime sleep data of pan-neuronal and pan-glial dome knockdowns. It would strengthen their conclusion if the nighttime sleep of pan-glial dome knockdown phenocopied the upd2/3 knockdowns as well, provided the pan-neuronal dome knockdown did not alter nighttime sleep.

      We have now added nighttime sleep data for both pan-glial and pan-neuronal domeless knockdowns in the revised manuscript (Fig. 2a). Glial knockdown increased nighttime sleep, similar to EEC-specific upd2/3 knockdown, while neuronal knockdown had no effect. These results further support the glial cells’ being the relevant target of gut-derived Unpaired signaling.

      (8) The authors only used one method to induce oxidative stress (hydrogen peroxide feeding). It would strengthen their argument to test multiple methods of inducing oxidative stress, such as lipopolysaccharide (LPS) feeding. In addition, it would be useful to use a direct bacterial infection to confirm that in flies, the infection promotes sleep. Additionally, flies deficient in Dome in the BBB and infected should not be affected in their sleep by the infection. These experiments would provide direct support for the mechanism proposed. Finally, the authors should add a primary reference for using ROS as a model of bacterial infection and justify their choice better.

      We agree that directly comparing different models of intestinal stress, such as bacterial infection or LPS feeding, would provide valuable insight into how gut-derived signals influence sleep in response to infection. As noted in our detailed responses above, we now include an expanded rationale for our use of H<sub>2</sub>O<sub>2</sub> feeding as a controlled and well-established method for inducing intestinal ROS – one of the key physiological responses to enteric infection and inflammation. In the revised Discussion, we explicitly acknowledge that pathogenic infections – which trigger both intestinal ROS and additional immune pathways – may engage distinct or complementary mechanisms compared to chemically induced oxidative stress. We emphasize the importance of future studies aimed at dissecting these differences. In fact, we are actively pursuing this direction in ongoing work examining sleep responses to enteric infection. For the purposes of the present study, however, we chose to focus on a tractable and specific model of ROS-induced stress to define the contribution of Unpaired cytokine signaling to gut-brain communication and sleep regulation. This approach allowed us to isolate the effect of oxidative stress from other confounding immune stimuli and identify a glia-mediated signaling mechanism linking gut epithelial stress to changes in sleep behavior.

      (9) To confirm that animals lacking EEC Unpaired signaling are not more susceptible to ROS-induced damage, the authors assessed the survival of upd2 and upd3 knockdowns on 1% H2O2 and concluded they display no additional sensitivity to oxidative stress compared to controls. It may be useful to include other tests of sensitivity to oxidative stress, in addition to survival.

      We appreciate the reviewer’s suggestion. In our view, survival is a highly informative and stringent readout, as it reflects the overall physiological capacity of the animal to withstand oxidative stress. Importantly, our data show that animals lacking EEC-derived Unpaired signaling do not exhibit reduced survival following H<sub>2</sub>O<sub>2</sub> exposure, indicating that their oxidative stress resistance is not compromised. Furthermore, we previously confirmed that feeding behavior is unaffected in these animals, suggesting that their ability to ingest food (and thus the stressor) is not impaired. As a molecular complement to these assays in response to this point and others, we have also performed an assessment of neuronal apoptosis (a TUNEL assay, Fig. S3f,g). This assay did not identify an increase in cell death in the brains of animals fed peroxide-containing medium. Thus, gross neurological health, behavior, and overall survival appear to be resilient to the environmental treatment regime we apply here, suggesting that the outcomes we observe arise from signaling per se.

      (10) The authors confirmed that animals lacking EEC-derived upd3 displayed sleep suppression similar to controls in response to starvation. These results led the authors to conclude that there is a specific requirement for EEC-derived Unpaired signaling in responding to intestinal oxidative stress. However, they previously showed that EEC-specific knockdown of upd3 and upd2 led to increased daytime sleep under normal feeding conditions. Their interpretations of their data are inconsistent.

      We appreciate the reviewer’s comment. While animals lacking EEC-derived Unpaired signaling show increased baseline sleep under normal feeding conditions, they still exhibit a robust reduction in sleep when subjected to starvation – comparable to that of control animals (Fig. S3h–j). This demonstrates that they retain the capacity to appropriately modulate sleep in response to metabolic stress. Thus, the sleep-promoting phenotype under normal conditions does not reflect a generalized inability to adjust sleep behavior. Rather, it highlights a specific role for Unpaired signaling in mediating sleep responses to intestinal oxidative stress, not in broadly regulating all sleep-modulating stimuli.

      (11) The authors report a significant increase in JAK-STAT activity in surface glial cells at ZT0 in animals fed 1% H2O2-containing food for 20 hours. This response was abolished in animals with EECspecific knockdown of upd2 or upd3. The authors confirmed there were no unintended neuronal effects on upd2 or upd3 expression in the heads. They also observed an upregulation of dome transcript levels in the heads of animals with EEC-specific knockdown of upd3 fed 1% H2O2-containing food for 15 hours, which they interpret to be a compensatory mechanism in response to low levels of the ligand. This assay is inconsistent with previous experiments in which animals were fed hydrogen peroxide for 20 hours.

      We thank the reviewer for identifying this discrepancy. The inconsistency arose from a labeling error in the manuscript. Both the JAK-STAT reporter assays in glial cells and the dome expression measurements were performed following 15 hours of H<sub>2</sub>O<sub>2</sub> feeding, not 20 hours as previously stated. We have now corrected this in the revised manuscript.

      (12) The authors show that animals with glia-specific dome knockdown did not have decreased survival on H2O2-containing food, and displayed normal rebound sleep in the morning following sleep deprivation. These results potentially undermine the significance of the paper. If the normal sleep response to oxidative stress is an important protective mechanism, why would oxidative stress not decrease survival in dome knockdown flies (that don't have the normal sleep response to oxidative stress)? This suggests that the proposed mechanism is not important for survival. The authors conclude that Dome-mediated JAK-STAT signaling in the glial cells specifically regulates ROS-induced sleep responses, which their results support.

      We agree that our survival data show that glial dome knockdown does not reduce survival under continuous oxidative stress. However, we believe this does not undermine the importance of the sleep response as an adaptive mechanism. In our survival assay, animals were continuously exposed to 1% H<sub>2</sub>O<sub>2</sub> without the opportunity to recover. In contrast, under natural conditions, oxidative stress is likely to be intermittent, and the ability to mount a sleep response may be particularly important for promoting recovery and maintaining homeostasis during or after transient stress episodes. Thus, while the JAK-STAT-mediated sleep response may not directly enhance survival under constant oxidative challenge, it likely plays a critical role in adaptive recovery under natural conditions.

      (13) Altogether, the authors conclude that enteric oxidative stress induces the release of Unpaired cytokines which activate the JAK-STAT pathway in subperineurial glia of the BBB, which leads to the glial downregulation of receptors for AstA, which is a wake-promoting factor also released by EECs. This mechanism is supported by their results, however, this research raises some intriguing questions, such as the role of upd2 versus upd3, the role of AstA-R1 versus AstA-R2, the importance of this mechanism in terms of survival, the sex-specific nature of this mechanism, and the role that nutritional availability plays in the dual functionality of Unpaired cytokine signaling in regards to sleep.

      We thank the reviewer for highlighting these important questions. Our data suggest that Upd2 and Upd3, while often considered partially redundant, both contribute to sleep regulation, with stronger effects observed for Upd3. This is consistent with prior studies indicating overlapping but non-identical roles for these cytokines. Similarly, although AstA-R1 and AstA-R2 can both be activated by AstA, knockdown of AstA-R2 consistently produces more robust sleep phenotypes, suggesting a predominant role in mediating this effect. The possibility of sex-specific regulation is indeed compelling. While our study focused on females, many gut hormones show sex-dependent activity, and we recognize this as an important avenue for future research. Finally, we have included new data in the revised manuscript showing that gut-derived AstA is downregulated under oxidative stress, further supporting our model in which Unpaired signaling suppresses arousal pathways during intestinal stress

      (14)Data Availability: It is indicated that: "Reasonable data requests will be fulfilled by the lead author". However, eLife's guidelines for data sharing require that all data associated with an article to be made freely and widely available.

      We thank the reviewer for pointing this out. We have revised the Data Availability section of the manuscript to clarify that all data will be made freely available from the lead contact without restriction, in accordance with eLife’s open data policy.

      References

      (1) Li, Y., Zhou, X., Cheng, C., Ding, G., Zhao, P., Tan, K., Chen, L., Perrimon, N., Veenstra, J.A., Zhang, L., and Song, W. (2023). Gut AstA mediates sleep deprivaPon-induced energy wasPng in Drosophila. Cell Discov 9, 49. 10.1038/s41421-023-00541-3. (2) Ahrentlov, N., Kubrak, O., Lassen, M., Malita, A., Koyama, T., Frederiksen, A.S., Sigvardsen, C.M., John, A., Madsen, P., Halberg, K.A., et al. (2025). Protein-responsive gut hormone Tachykinin directs food choice and impacts lifespan. Nature Metabolism. 10.1038/s42255-025-01267-0.

      (3) Li, H., Janssens, J., De Waegeneer, M., Kolluru, S.S., Davie, K., Gardeux, V., Saelens, W., David, F.P.A., Brbic, M., Spanier, K., et al. (2022). Fly Cell Atlas: A single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432. 10.1126/science.abk2432.

      (4) Kubrak, O., Koyama, T., Ahrentlov, N., Jensen, L., Malita, A., Naseem, M.T., Lassen, M., Nagy, S., Texada, M.J., Halberg, K.V., and Rewitz, K. (2022). The gut hormone AllatostaPn C/SomatostaPn regulates food intake and metabolic homeostasis under nutrient stress. Nature communicaPons 13, 692. 10.1038/s41467-022-28268-x.

      (5) Malita, A., Kubrak, O., Koyama, T., Ahrentlov, N., Texada, M.J., Nagy, S., Halberg, K.V., and Rewitz, K. (2022). A gut-derived hormone suppresses sugar appePte and regulates food choice in Drosophila. Nature Metabolism 4, 1532-1550. 10.1038/s42255-022-00672-z.