10,000 Matching Annotations
  1. Oct 2025
    1. eLife Assessment

      In this important study, the authors set out to determine the molecular interactions between the AQP2 from Trypanosoma brucei (TbAQP2) and the trypanocidal drugs pentamidine and melarsoprol to understand how TbAQP2 mutations lead to drug resistance. Using cryo-EM, molecular dynamics simulations, and lysis assays the authors present convincing evidence that mutations in TbAQP2 make permeation of trypanocidal drugs energetically less favourable, and that this impacts the ability of drugs to achieve a therapeutic dose. Overall, this data will be of interest for those working on aquaporins, and development of trypanosomiasis drugs as well as drugs targeting aquaporins in general.

    2. Reviewer #1 (Public review):

      This study presents cryoEM-derived structures of the Trypanosome aquaporin AQP2, in complex with its natural ligand, glycerol, as well as two trypanocidal drugs, pentamidine and melarsoprol, which use AQP2 as an uptake route. The structures are high quality and the density for the drug molecules is convincing, showing a binding site in the centre of the AQP2 pore.

      The authors then continue to study this system using molecular dynamics simulations. Their simulations indicate that the drugs can pass through the pore and identify a weak binding site in the centre of the pore, which corresponds with that identified through cryoEM analysis. They also simulate the effect of drug resistance mutations which suggests that the mutations reduce the affinity for drugs and therefore might reduce the likelihood that the drugs enter into the centre of the pore, reducing the likelihood that they progress through into the cell.

      While the cryoEM and MD studies are well conducted, it is a shame that the drug transport hypothesis was not tested experimentally. For example, did they do cryoEM with AQP2 with drug resistance mutations and see if they could see the drugs in these maps? They might not bind, but another possibility is that the binding site shifts, as seen in Chen et al? Do they have an assay for measuring drug binding? I think that some experimental validation of the drug binding hypothesis would strengthen this paper. The authors describe in their response why these experiments are challenging.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present 3.2-3.7 Å cryo-EM structures of Trypanosoma brucei aquaglyceroporin-2 (TbAQP2) bound to glycerol, pentamidine or melarsoprol and combine them with extensive all-atom MD simulations to explain drug recognition and resistance mutations. The work provides a persuasive structural rationale for (i) why positively selected pore substitutions enable diamidine uptake, and (ii) how clinical resistance mutations weaken the high-affinity energy minimum that drives permeation. These insights are valuable for chemotherapeutic re-engineering of diamidines and aquaglyceroporin-mediated drug delivery.

      My comments are on the MD part

      Strengths:

      The study

      Integrates complementary cryo-EM, equilibrium and applied voltage MD simulations, and umbrella-sampling PMFs, yielding a coherent molecular-level picture of drug permeation.

      Offers direct structural rationalisation of long-standing resistance mutations in trypanosomes, addressing an important medical problem.

      Comments on revisions:

      Most of the weaknesses have been resolved during the revision process.

    4. Reviewer #3 (Public review):

      Summary:

      Recent studies have established that trypanocidal drugs, including pentamidine and melarsoprol, enter the trypanosomes via the glyceroaquaporin AQP2 (TbAQP2). Interestingly, drug resistance in trypanosomes is, at least in part, caused by recombination with the neighbouring gene, AQP3, which is unable to permeate pentamidine or melarsoprol. The effect of the drugs on cells expressing chimeric proteins is significantly reduced. In addition, controversy exists regarding whether TbAQP2 permeates the drugs like an ion channel, or whether it serves as a receptor that triggers downstream processes upon drug binding. In this study the authors set out to achieve these objectives: 1) to understand the molecular interactions between TbAQP2 and glycerol, pentamidine, and melarsoprol, and 2) to determine the mechanism by which mutations that arise from recombination with TbAQP3 result in reduced drug permeation.

      The cryo-EM structures provide details of glycerol and drug binding, and show that glycerol and the drugs occupy the same space within the pore. Finally, MD simulations and lysis assays are employed to determine how mutations in TbAQP2 result in reduced permeation of drugs by making entry and exit of the drug relatively more energy-expensive. Overall, the strength of evidence used to support the author's claims is solid.

      Strengths:

      The cryo-EM portion of the study is strong, and while the overall resolution of the structures is in the 3.5Å range, the local resolution within the core of the protein and the drug binding sites is considerably higher (~2.5Å).<br /> I also appreciated the MD simulations on the TbAQP2 mutants and the mechanistic insights that resulted from this data.

      Weaknesses:

      (1) The authors do not provide any experimental validation the drug binding sites in TbAQP2 due to lacking resources. However, the claims have been softened in the revised paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      This study presents cryoEM-derived structures of the Trypanosome aquaporin AQP2, in complex with its natural ligand, glycerol, as well as two trypanocidal drugs, pentamidine and melarsoprol, which use AQP2 as an uptake route. The structures are high quality, and the density for the drug molecules is convincing, showing a binding site in the centre of the AQP2 pore. 

      The authors then continue to study this system using molecular dynamics simulations. Their simulations indicate that the drugs can pass through the pore and identify a weak binding site in the centre of the pore, which corresponds with that identified through cryoEM analysis. They also simulate the effect of drug resistance mutations, which suggests that the mutations reduce the affinity for drugs and therefore might reduce the likelihood that the drugs enter into the centre of the pore, reducing the likelihood that they progress through into the cell. 

      While the cryoEM and MD studies are well conducted, it is a shame that the drug transport hypothesis was not tested experimentally. For example, did they do cryoEM with AQP2 with drug resistance mutations and see if they could see the drugs in these maps? They might not bind, but another possibility is that the binding site shifts, as seen in Chen et al. 

      TbAQP2 from the drug-resistant mutants does not transport either melarsoprol or pentamidine and there was thus no evidence to suggest that the mutant TbAQP2 channels could bind either drug. Moreover, there is not a single mutation that is characteristic for drug resistance in TbAQP2: references 12–15 show a plethora of chimeric AQP2/3 constructs in addition to various point mutations in laboratory strains and field isolates. In reference 17 we describe a substantial number of SNPs that reduced pentamidine and melarsoprol efficacy to levels that would constitute clinical resistance to acceptable dosage regimen. It thus appears that there are many and diverse mutations that are able to modify the protein sufficiently to induce resistance, and likely in multiple different ways, including the narrowing of the pore, changes to interacting amino acids, access to the pore etc. We therefore did not attempt to determine the structures of the mutant channels because we did not think that in most cases we would see any density for the drugs in the channel, and we would be unable to define ‘the’ resistance mechanism if we did in the case of one individual mutant TbAQP2. Our MD data suggests that pentamidine binding affinity is in the range of 50-300 µM for the mutant TbAQP2s selected for that test (I110W and L258Y/L264R), i.e. >1000-fold higher than TbAQP2WT. Thus these structures will be exceedingly challenging to determine with pentamidine in the pore but, of course, until the experiment has been tried we will not know for sure.

      Do they have an assay for measuring drug binding? 

      We tried many years ago to develop a <sup>3</sup>H-pentamidine binding assay to purified wild type TbAQP2 but we never got satisfactory results even though the binding should be in the doubledigit nanomolar range. This may be for any number of technical reasons and could also be partly because flexible di-benzamidines bind non-specifically to proteins at µM concentrations giving rise to high background. Measuring binding to the mutants was not tested given that they would be binding pentamidine in the µM range. If we were to pursue this further, then isothermal titration calorimetry (ITC) may be one way forward as this can measure µM affinity binding using unlabelled compounds, although it uses a lot of protein and background binding would need to be carefully assessed; see for example our work on measuring tetracycline binding to the tetracycline antiporter TetAB (https://doi.org/10.1016/j.bbamem.2015.06.026 ). Membrane proteins are also particularly tricky for this technique as the chemical activity of the protein solution must be identical to the chemical activity of the substrate solution which titrates in the molecule binding to the protein; this can be exceedingly problematic if any free detergent remains in the purified membrane protein. Another possibility may be fluorescence polarisation spectroscopy, although this would require fluorescently labelling the drugs which would very likely affect their affinity for TbAQP2 and how they interact with the wild type and mutant proteins – see the detailed SAR analysis in Alghamdi et al. 2020 (ref. 17). As you will appreciate, it would take considerable time and effort to set up an assay for measuring drug binding to mutants and is beyond the current scope of the current work.

      I think that some experimental validation of the drug binding hypothesis would strengthen this paper. Without this, I would recommend the authors to soften the statement of their hypothesis (i.e, lines 65-68) as this has not been experimentally validated.

      We agree with the referee that direct binding of drugs to the mutants would be very nice to have, but we have neither the time nor resources to do this. We have therefore softened the statement on lines 65-68 to read ‘Drug-resistant TbAQP2 mutants are still predicted to bind pentamidine, but the much weaker binding in the centre of the channel observed in the MD simulations would be insufficient to compensate for the high energy processes of ingress and egress, hence impairing transport at pharmacologically relevant concentrations.’ 

      Reviewer #2 (Public review): 

      Summary: 

      The authors present 3.2-3.7 Å cryo-EM structures of Trypanosoma brucei aquaglyceroporin-2 (TbAQP2) bound to glycerol, pentamidine, or melarsoprol and combine them with extensive allatom MD simulations to explain drug recognition and resistance mutations. The work provides a persuasive structural rationale for (i) why positively selected pore substitutions enable diamidine uptake, and (ii) how clinical resistance mutations weaken the high-affinity energy minimum that drives permeation. These insights are valuable for chemotherapeutic re-engineering of diamidines and aquaglyceroporin-mediated drug delivery. 

      My comments are on the MD part. 

      Strengths: 

      The study 

      (1) Integrates complementary cryo-EM, equilibrium, applied voltage MD simulations, and umbrella-sampling PMFs, yielding a coherent molecular-level picture of drug permeation. 

      (2) Offers direct structural rationalisation of long-standing resistance mutations in trypanosomes, addressing an important medical problem. 

      Weaknesses: 

      Unphysiological membrane potential. A field of 0.1 V nm ¹ (~1 V across the bilayer) was applied to accelerate translocation. From the traces (Figure 1c), it can be seen that the translocation occurred really quickly through the channel, suggesting that the field might have introduced some large changes in the protein. The authors state that they checked visually for this, but some additional analysis, especially of the residues next to the drug, would be welcome. 

      This is a good point from the referee, and we thank them for raising it. It is common to use membrane potentials in simulations that are higher than the physiological value, although these are typically lower than used here. The reason we used the higher value was to speed sampling and it still took 1,400 ns for transport in the physiologically correct direction, and even then, only in 1/3 repeats. Hence this choice of voltage was probably necessary to see the effect. The exceedingly slow rate of pentamidine permeation seen in the MD simulation was consistent with the experimental observations, as discussed in Alghamdi et al (2020) [ref. 17] where we estimated that TbAQP2-mediated pentamidine uptake in T. brucei bloodstream forms proceeds at just 9.5×10<sup>5</sup> molecules/cell/h; the number of functional TbAQP2 units in the plasma membrane is not known but their location is limited to the small flagellar pocket (Quintana et al. PLoS Negl Trop Dis 14, e0008458 (2020)). 

      The referee is correct that it is important to make sure that the applied voltage is not causing issues for the protein, especially for residues in contact with the drug. We have carried out RMSF analysis to better test this. The data show that comparing our simulations with the voltage applied to the monomeric MD simulations + PNTM with no voltage reveals little difference in the dynamics of the drug-contacting residues. 

      We have added these new data as Supplementary Fig12b with a new legend (lines1134-1138) 

      ‘b, RMSF calculations were run on monomeric TbAQP2 with either no membrane voltage or a 0.1V nm<sup>-1</sup> voltage applied (in the physiological direction). Shown are residues in contact with the pentamidine molecule, coloured by RMSF value. RMSF values are shown for residues Leu122, Phe226, Ile241, and Leu264. The data suggest the voltage has little impact on the flexibility or stability of the pore lining residues.’

      We have also added the following text to the manuscript (lines 524-530):

      ‘Membrane potential simulations were run using the computational electrophysiology protocol. An electric field of 0.1 V/nm was applied in the z-axis dimension only, to create a membrane potential of about 1 V (see Fig. S10a). Note that this is higher than the physiological value of 87.1 ± 2.1 mV at pH 7.3 in bloodstream T. brucei, and was chosen to improve the sampling efficiency of the simulations. The protein and lipid molecules were visually confirmed to be unaffected by this voltage, which we quantify using RMSF analysis on pentamidine-contacting residues (Fig. S12b).’ 

      Based on applied voltage simulations, the authors argue that the membrane potential would help get the drug into the cell, and that a high value of the potential was applied merely to speed up the simulation. At the same time, the barrier for translocation from PMF calculations is ~40 kJ/mol for WT. Is the physiological membrane voltage enough to overcome this barrier in a realistic time? In this context, I do not see how much value the applied voltage simulations have, as one can estimate the work needed to translocate the substrate on PMF profiles alone. The authors might want to tone down their conclusions about the role of membrane voltage in the drug translocation.

      We agree that the PMF barriers are considerable, however we highlight that other studies have seen similar landscapes, e.g. PMID 38734677 which saw a barrier of ca. 10-15 kcal/mol (ca. 4060 kJ/mol) for PNTM transversing the channel. This was reduced by ca. 4 kcal/mol when a 0.4 V nm ¹ membrane potential was applied, so we expect a similar effect to be seen here. 

      We have updated the Results to more clearly highlight this point and added the following text (lines 274-275):

      We note that previous studies using these approaches saw energy barriers of a similar size, and that these are reduced in the presence of a membrane voltage[17,31].’ 

      Pentamidine charge state and protonation. The ligand was modeled as +2, yet pKa values might change with the micro-environment. Some justification of this choice would be welcome. 

      Pentamidine contains two diamidine groups and each are expected to have a pKa above 10 in solution (PMID: 20368397), suggesting that the molecule will carry a +2 charge. Using the +2 charge is also in line with previous MD studies (PMID: 32762841). We have added the following text to the Methods (lines 506-509):

      ‘The pentamidine molecule used existing parameters available in the CHARMM36 database under the name PNTM with a charge state of +2 to reflect the predicted pKas of >10 for these groups [73] and in line with previous MD studies[17].’

      We note that accounting for the impact of the microenvironment is an excellent point – future studies might employ constant pH calculations to address this.

      The authors state that this RMSD is small for the substrate and show plots in Figure S7a, with the bottom plot being presumably done for the substrate (the legends are misleading, though), levelling off at ~0.15 nm RMSD. However, in Figure S7a, we see one trace (light blue) deviating from the initial position by more than 0.2 nm - that would surely result in an RMSD larger than 0.15, but this is somewhat not reflected in the RMSD plots. 

      The bottom plot of Fig. S9a (previously Fig. S7a) is indeed the RMSD of the drug (in relation to the protein). We have clarified the legend with the following text (lines 1037-1038): ‘… or for the pentamidine molecule itself, i.e. in relation to the Cα of the channel (bottom).’ 

      With regards the second comment, we assume the referee is referring to the light blue trace from Fig S9c. These data are actually for the monomeric channel rather than the tetramer. We apologise for not making this clearer in the legend. We have added the word ‘monomeric’ (line 1041).

      Reviewer #3 (Public review): 

      Summary: 

      Recent studies have established that trypanocidal drugs, including pentamidine and melarsoprol, enter the trypanosomes via the glyceroaquaporin AQP2 (TbAQP2). Interestingly, drug resistance in trypanosomes is, at least in part, caused by recombination with the neighbouring gene, AQP3, which is unable to permeate pentamidine or melarsoprol. The effect of the drugs on cells expressing chimeric proteins is significantly reduced. In addition, controversy exists regarding whether TbAQP2 permeates drugs like an ion channel, or whether it serves as a receptor that triggers downstream processes upon drug binding. In this study the authors set out to achieve three objectives: 

      (1) to determine if TbAQP2 acts as a channel or a receptor,

      We should clarify here that this was not an objective of the current manuscript as the transport activity has already been extensively characterised in the literature, as described in the introduction.

      (2) to understand the molecular interactions between TbAQP2 and glycerol, pentamidine, and melarsoprol, and 

      (3) to determine the mechanism by which mutations that arise from recombination with TbAQP3 result in reduced drug permeation. 

      Indeed, all three objectives are achieved in this paper. Using MD simulations and cryo-EM, the authors determine that TbAQP2 likely permeates drugs like an ion channel. The cryo-EM structures provide details of glycerol and drug binding, and show that glycerol and the drugs occupy the same space within the pore. Finally, MD simulations and lysis assays are employed to determine how mutations in TbAQP2 result in reduced permeation of drugs by making entry and exit of the drug relatively more energy-expensive. Overall, the strength of evidence used to support the author's claims is solid. 

      Strengths: 

      The cryo-EM portion of the study is strong, and while the overall resolution of the structures is in the 3.5Å range, the local resolution within the core of the protein and the drug binding sites is considerably higher (~2.5Å). 

      I also appreciated the MD simulations on the TbAQP2 mutants and the mechanistic insights that resulted from this data. 

      Weaknesses: 

      (1) The authors do not provide any empirical validation of the drug binding sites in TbAQP2. While the discussion mentions that the binding site should not be thought of as a classical fixed site, the MD simulations show that there's an energetically preferred slot (i.e., high occupancy interactions) within the pore for the drugs. For example, mutagenesis and a lysis assay could provide us with some idea of the contribution/importance of the various residues identified in the structures to drug permeation. This data would also likely be very valuable in learning about selectivity for drugs in different AQP proteins.

      On a philosophical level, we disagree with the requirement for ‘validation’ of a structure by mutagenesis. It is unclear what such mutagenesis would tell us beyond what was already shown experimentally through <sup>3</sup>H-pentamidine transport, drug sensitivity and lysis assays i.e. a given mutation will impact permeation to a certain extent. But on the structural level, what does mutagenesis tell us? If a bulky aromatic residue that makes many van der Waals interactions with the substrate is changed to an alanine residue and transport is reduced, what does this mean? It would confirm that the phenylalanine residue is very likely indeed making van der Waals contacts to the substrate, but we knew that already from the WT structure. And if it doesn’t have any effect? Well, it could mean that the van der Waals interactions with that particular residue are not that important or it could be that the substrate has changed its positions slightly in the channel and the new pose has similar energy of interactions to that observed in the wild type channel. Regardless of the result, any data from mutagenesis would be open to interpretation and therefore would not impact on the conclusions drawn in this manuscript. We might not learn anything new unless all residues interacting with the substrate are mutated, the structure of each mutant was determined and MD simulations were performed for all, which is beyond the scope of this work. Even then, the value for understanding clinical drug resistance would be limited, as this phenomenon has been linked to various chimeric rearrangements with adjacent TbAQP3 (references 12–15), each with a structure distinct from TbAQP2 with a single SNP. We also note that the recent paper by Chen et al. did not include any mutagenesis of the drug binding sites in TbAQP2 in their analysis of TbAQP2, presumably for similar reasons as discussed above.

      (2) Given the importance of AQP3 in the shaping of AQP2-mediated drug resistance, I think a figure showing a comparison between the two protein structures/AlphaFold structures would be beneficial and appropriate

      We agree that the comparison is of considerably interest and would contribute further to our understanding of the unique permeation capacities of TbAQP2. As such, we followed the reviewer’s suggestion and made an AlphaFold model of TbAQP3 and compared it to our structures of TbAQP2. The RMSD is 0.6 Å to the pentamidine-bound TbAQP2, suggesting that the fold of TbAQP3 has been predicted well, although the side chain rotamers cannot be assessed for their accuracy. Previous work has defined the selectivity filter of TbAQP3 to be formed by W102, R256, Y250. The superposition of the TbAQP3 model and the TbAQP2 pentamidine-bound structure shows that one of the amine groups is level with R256 and that there is a clash with Y250 and the backbone carbonyl of Y250, which deviates in position from the backbone of TbAQP2 in this region. There is also a clash with Ile252. 

      Although these observations are indeed interesting, on their own they are highly preliminary and extensive further work would be necessary to draw any convincing conclusions regarding these residues in preventing uptake of pentamidine and melarsoprol. The TbAQP3 AlphaFold model would need to be verified by MD simulations and then we would want to look at how pentamidine would interact with the channel under different experimental conditions like we have done with TbAQP2. We would then want to mutate to Ala each of the residues singly and in combination and assess them in uptake assays to verify data from the MD simulations. This is a whole new study and, given the uncertainties surrounding the observations of just superimposing TbAQP2 structure and the TbAQP3 model, we feel that, regrettably, this is just too speculative to add to our manuscript. 

      (3) A few additional figures showing cryo-EM density, from both full maps and half maps, would help validate the data. 

      Two new Supplementary Figures have been made, on showing the densities for each of the secondary structure elements (the new Figure S5) and one for the half maps showing the ligands (the new Figure S6). All the remaining supplementary figures have been renamed accordingly.

      (4) Finally, this paper might benefit from including more comparisons with and analysis of data published in Chen et al (doi.org/10.1038/s41467-024-48445-4), which focus on similar objectives. Looking at all the data in aggregate might reveal insights that are not obvious from either paper on their own. For example, melarsoprol binds differently in structures reported in the two respective papers, and this may tell us something about the energy of drug-protein interactions within the pore. 

      We already made the comparisons that we felt were most pertinent and included a figure (Fig. 5) to show the difference in orientation of melarsoprol in the two structures. We do not feel that any additional comparison is sufficiently interesting to be included. As we point out, the structures are virtually identical (RMSD 0.6 Å) and therefore there are no further mechanistic insights we would like to make beyond the thorough discussion in the Chen et al paper.

      Reviewer #1 (Recommendations for the authors): 

      (1) Line 65 - I don't think that the authors have tested binding experimentally, and so rather than 'still bind', I think that 'are still predicted to bind' is more appropriate. 

      Changed as suggested

      (2) Line 69 - remove 'and' 

      Changed as suggested

      (3) Line 111 - clarify that it is the protein chain which is 'identical'. Ligands not. 

      Changed to read ‘The cryo-EM structures of TbAQP2 (excluding the drugs/substrates) were virtually identical…

      (4) Line 186 - make the heading of this section more descriptive of the conclusion than the technique? 

      We have changed the heading to read: ‘Molecular dynamics simulations show impaired pentamidine transport in mutants’

      Reviewer #2 (Recommendations for the authors): 

      (1) Methods - a rate of 1 nm per ns is mentioned for pulling simulations, is that right? 

      Yes, for the generation of the initial frames for the umbrella sampling a pull rate of 1 nm/ns was used in either an upwards or downwards z-dimension

      (2) Figure S9 and S10 have their captions swapped. 

      The captions have been swapped to their proper positions.

      (3) Methods state "40 ns per window" yet also that "the first 50 ns of each window was discarded as equilibration". 

      Well spotted - this line should have read “the first 5 ns of each window was discarded as equilibration”. This has been corrected (line 541).

      Reviewer #3 (Recommendations for the authors): 

      (1) Abstract, line 68-70: incomplete sentence.

      The sentence has been re-written: ‘The structures of drug-bound TbAQP2 represent a novel paradigm for drug-transporter interactions and are a new mechanism for targeting drugs in pathogens and human cells.

      (2) Line 312-313: The paper you mention here came out in May 2024 - a year ago. I appreciate that they reported similar structural data, but for the benefit of the readers and the field, I would recommend a more thorough account of the points by which the two pieces of work differ. Is there some knowledge that can be gleaned by looking at all the data in the two papers together? For example, you report a glycerol-bound structure while the other group provides an apo one. Are there any mechanistic insights that can be gained from a comparison?

      We already made the comparisons that we felt were most pertinent and included a figure (Fig. 5) to show the difference in orientation of melarsoprol in the two structures. We do not feel that any additional comparison is sufficiently interesting to be included. As we point out, the structures are virtually identical (RMSD 0.6 Å) and therefore there are no further mechanistic insights we would like to make beyond the thorough discussion in the Chen et al paper.

      (3) Similarly, you can highlight the findings from your MD simulations on the TbAQP2 drug resistance mutants, which are unique to your study. How can this data help with solving the drug resistance problem?

      New drugs will need to be developed that can be transported by the mutant chimera AQP2s and the models from the MD simulations will provide a starting point for molecular docking studies. Further work will then be required in transport assays to optimise transport rather than merely binding. However, the fact that drug resistance can also arise through deletion of the AQP2 gene highlights the need for developing new drugs that target other proteins.

      (4) A glaring question that one has as a reader is why you have not attempted to solve the structures of the drug resistance mutants, either in complex with the two compounds or in their apo/glycerol-bound form? To be clear, I am not requesting this data, but it might be a good idea to bring this up in the discussion.

      TbAQP2 containing the drug-resistant mutants does not transport either melarsoprol or pentamidine (Munday et al., 2014; Alghamdi et al., 2020); there was thus no evidence to suggest that the mutant TbAQP2 channels could bind either drug. We therefore did not attempt to determine the structures of the mutant channels because we did not think that we would see any density for the drugs in the channel. Our MD data suggests that pentamidine binding affinity is in the range of 50-300 µM for the mutant TbAQP2, supporting the view that getting these structures would be highly challenging, but of course until the experiment is tried we will not know for sure.

      We also do not think we would learn anything new about doing structures of the drug-free structures of the transport-negative mutants of TbAQP2. The MD simulations have given novel insights into why the drugs are not transported and we would rather expand effort in this direction and look at other mutants rather than expend further effort in determining new structures.

      (5) Line 152-156: Is there a molecular explanation for why the TbAQP2 has 2 glycerol molecules captured in the selectivity filter while the PfAQP2 and the human AQP7 and AQP10 have 3?

      The presence of glycerol molecules represents local energy minima for binding, which will depend on the local disposition of appropriate hydrogen bonding atoms and hydrophobic regions, in conjunction with the narrowness of the channel to effectively bind glycerol from all sides. It is noticeable that the extracellular region of the channel is wider in TbAQP2 than in AQP7 and AQP10, so this may be one reason why additional ordered glycerol molecules are absent, and only two are observed. Note also that the other structures were determined by X-ray crystallography, and the environment of the crystal lattice may have significantly decreased the rate of diffusion of glycerol, increasing the likelihood of observing their electron densities.

      (6) I would also think about including the 8JY7 (TbAQP2 apo) structure in your analysis.

      We included 8JY7 in our original analyses, but the results were identical to 8JY6 and 8JY8 in terms of the protein structure, and, in the absence of any modelled substrates in 8JY7 (the interesting part for our manuscript), we therefore have not included the comparison.

      (7) I also think, given the importance of AQP3 in this context, it would be really useful to have a comparison with the AQP3 AlphaFold structure in order to examine why it does not permeate drugs.

      We made an AlphaFold model of TbAQP3 and compared it to our structures of TbAQP2. The RMSD is 0.6 Å to the pentamidine-bound TbAQP2, suggesting that the fold of TbAQP3 has been predicted well, although the side chain rotamers cannot be assessed for their accuracy. Previous work has defined the selectivity filter of TbAQP3 to be formed by W102, R256, Y250. The superposition of the TbAQP3 model and the TbAQP2 pentamidine-bound structure shows that one of the amine groups is level with R256 and that there is a clash with Y250 and the backbone carbonyl of Y250, which deviates in position from the backbone of TbAQP2 in this region. There is also a clash with Ile252. 

      Although these observations are interesting, on their own they are preliminary in the extreme and extensive further work will be necessary to draw any convincing conclusions regarding these residues in preventing uptake of pentamidine and melarsoprol. The TbAQP3 AlphaFold model would need to be verified by MD simulations and then we would want to look at how pentamidine would interact with the channel under different experimental conditions like we have done with TbAQP2. We would then want to mutate to Ala each of the residues singly and in combination and assess them in uptake assays to verify data from the MD simulations. This is a whole new study and, given the uncertainties surrounding the observations of just superimposing TbAQP2 structure and the TbAQP3 model, we feel this is just too speculative to add to our manuscript. 

      (8) To validate the densities representing glycerol and the compounds, you should show halfmap densities for these. 

      A new figure, Fig S6 has been made to show the half-map densities for the glycerol and drugs.

      (9) I would also like to see the density coverage of the individual helices/structural elements. 

      A new figure, Fig S5 has been made to show the densities for the structural elements.

      (10) While the LigPlot figure is nice, I think showing the data (including the cryo-EM density) is necessary validation.

      The LigPlot figure is a diagram (an interpretation of data) and does not need the densities as these have already been shown in Fig. 1c (the data).

      (11) I would recommend including a figure that illustrates the points described in lines 123-134.

      All of the points raised in this section are already shown in Fig. 2a, which was referred to twice in this section. We have added another reference to Fig.2a on lines 134-135 for completeness.

      (12) Line 202: I would suggest using "membrane potential/voltage" to avoid confusion with mitochondrial membrane potential. 

      We have changed this to ‘plasma membrane potential’ to differentiate it from mitochondrial membrane potential.

      (13) Figure 4: Label C.O.M. in the panels so that the figure corresponds to the legend. 

      We have altered the figure and added and explanation in the figure legend (lines 716-717):

      ‘Cyan mesh shows the density of the molecule across the MD simulation. and the asterisk shows the position of the centre of mass (COM).’

      (14) Figure S2: Panels d and e appear too similar, and it is difficult to see the stick representation of the compound. I would recommend either using different colours or showing a close-up of the site.

      We have clarified the figure by including two close-up views of the hot-spot region, one with melarsoprol overlaid and one with pentamidine overlaid

      (15) Figure S2: Typo in legend: 8YJ7 should be 8JY7.

      Changed as suggested  

      (16) Figure S3 and Figure S4: Please clarify which parts of the process were performed in cryoSPARC and which in Relion. 

      Figure S3 gives an overview of the processing and has been simplified to give the overall picture of the procedures. All of the details were included in the Methods section as other programmes are used, not just cryoSPARC and Relion. Given the complexities of the processing, we have referred the readers to the Methods section rather than giving confusing information in Fig. S3.

      We have updated the figure legend to Fig. S4 as requested.

      (17) Figure S9 and Figure S10: The legends are swapped in these two figures.

      The captions have been swapped to their proper positions.

      (18) For ease of orientation and viewing, I would recommend showing a vertical HOLE plot aligned with an image of the AQP2 pore. 

      The HOLE plot has been re-drawn as suggest (Fig. S2)

    1. eLife Assessment

      This study by Roseby and colleagues shows that region-specific mechanosensation - especially anterior-dorsal inputs - controls larval self-righting, and links this to Hox gene function in sensory neurons. The work is important for understanding how body plan cues shape sensorimotor behaviour, and the experimental toolkit will be of use to others. The strength of evidence is solid with respect to the assays developed and the involvement of the anterior region; it is incomplete with respect to dorso-ventral involvement in that region and the role of Hox genes in the process. These findings will be of broad interest to researchers studying neural circuits, developmental genetics, and the evolution of behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Roseby and colleagues report on a body region-specific sensory control of the fly larval righting response, a body contortion performed by fly larvae to correct their posture when they find themselves in an inverted (dorsal side down) position. This is an important topic because of the general need for animals to move about in the correct orientation and the clever methodologies used in this paper to uncover the sensory triggers for the behavior. Several innovative methodologies are developed, including a body region-specific optogenetic approach along different axial positions of the larva, region-specific manipulation of surface contacts with the substrate, and a 'water unlocking' technique to initiate righting behaviors, a strength of the manuscript. The authors found that multidendritic neurons, particularly the daIV neurons, are necessary for righting behavior. The contribution of daIV neurons had been shown by the authors in a prior paper (Klann et al, 2021), but that study had used constitutive neuronal silencing. Here, the authors used acute inactivation to confirm this finding. Additionally, the authors describe an important role for anterior sensory neurons and a need for dorsal substrate contact. Conversely, ventral sensory elements inhibit the righting behavior, presumably to ensure that the ventral-side-down position dominates. They move on to test the genetic basis for righting behavior and, consistent with the regional specificity they observe, implicate sensory neuron expression of Hox genes Antennapedia and Abdominal-b in self-righting.

      Strengths:

      Strengths of this paper include the important question addressed and the elegant and innovative combination of methods, which led to clear insights into the sensory biology of self-righting, and that will be useful for others in the field. This is a substantial contribution to understanding how animals correct their body position. The manuscript is very clearly written and couched in interesting biology.

      Limitations:

      (1) The interpretation of functional experiments is complicated by the proposed excitatory and inhibitory roles of dorsal and ventral sensory neuron activity, respectively. So, while silencing of an excitatory (dorsal) element might slow righting, silencing of inputs that inhibit righting could speed the behavior. Silencing them together, as is done here, could nullify or mask important D-V-specific roles. Selective manipulation of cells along the D-V axis could help address this caveat.

      (2) Prior studies from the authors implicated daIV neurons in the righting response. One of the main advances of the current manuscript is the clever demonstration of region-specific roles of sensory input. However, this is only confirmed with a general md driver, 190(2)80, and not with the subset-specific Gal4, so it is not clear if daIV sensory neurons are also acting in a regionally specific manner along the A-P axis.

      (3) The manuscript is narrowly focused on sensory neurons that initiate righting, which limits the advance given the known roles for daIV neurons in righting. With the suite of innovative new tools, there is a missed opportunity to gain a more general understanding of how sensory neurons contribute to the righting response, including promoting and inhibiting righting in different regions of the larva, as well as aspects of proprioceptive sensing that could be necessary for righting and account for some of the observed effects of 109(2)80.

      (4) Although the authors observe an influence of Hox genes in righting, the possible mechanisms are not pursued, resulting in an unsatisfying conclusion that these genes are somehow involved in a certain region-specific behavior by their region-specific expression. Are the cells properly maintained upon knockdown? Are axon or dendrite morphologies of the cells disrupted upon knockdown?

      (5) There could be many reasons for delays in righting behavior in the various manipulations, including ineffective sensory 'triggering', incoherent muscle contraction patterns, initiation of inappropriate behaviors that interfere with righting sequencing, and deficits in sensing body position. The authors show that delays in righting upon silencing of 109(2)80 are caused by a switch to head casting behavior. Is this also the case for silencing of daIV neurons, Hox RNAi experiments, and silencing of CO neurons? Does daIII silencing reduce head casting to lead to faster righting responses?

      (6) 109(2)80 is expressed in a number of central neurons, so at least some of the righting phenotype with this line could be due to silenced neurons in the CNS. This should at least be acknowledged in the manuscript and controlled for, if possible, with other Gal4 lines.

      Other points

      (7) Interpretation of roles of Hox gene expression and function in righting response should consider previous data on Hox expression and function in multidendritic neurons reported by Parrish et al. Genes and Development, 2007.

      (8) The daIII silencing phenotype could conceivably be explained if these neurons act as the ventral inhibitors. Do the authors have evidence for or against such roles?

    3. Reviewer #2 (Public review):

      Summary

      This work explores the relationship between body structure and behavior by studying self-righting in Drosophila larvae, a conserved behavior that restores proper orientation when turned upside-down. The authors first introduce a novel "water unlocking" approach to induce self-righting behavior in a controlled manner. Then, they develop a method for region-specific inhibition of sensory neurons, revealing that anterior, but not posterior, sensory neurons are essential for proper self-righting. Deep-learning-based behavioral analysis shows that anterior inhibition prolongs self-righting by shifting head movement patterns, indicating a behavioral switch rather than a mere delay. Additional genetic and molecular experiments demonstrate that specific Hox genes are necessary in sensory neurons, underscoring how developmental patterning genes shape region-specific sensory mechanisms that enable adaptive motor behaviors.

      Strengths

      The work of Roseby et al. does what it says on the tin. The experimental design is elegant, introducing innovative methods that will likely benefit the fly behavior community, and the results are robustly supported, without overstatement.

      Weaknesses:

      The manuscript is clearly written, flows smoothly, and features well-designed experiments. Nevertheless, there are areas that could be improved. Below is a list of suggestions and questions that, if addressed, would strengthen this work:

      (1) Figure 1A illustrates the sequence of self-righting behavior in a first instar larva, while the experiments in the same figure are performed on third instar larvae. It would be helpful to clarify whether the sequence of self-righting movements differs between larval stages. Later on in the manuscript, experiments are conducted on first instar larvae without explanation for the choice of stage. Providing the rationale for using different larval stages would improve clarity.

      (2) What was the genotype of the larvae used for the initial behavioral characterization (Figure 1)? It is assumed they were wild type or w1118, but this should be stated explicitly. This also raises the question of whether different wild-type strains exhibit this behavior consistently or if there is variability among them. Has this been tested?

      (3) Could the observed slight leftward bias in movement angles of the tail (Figure 1I and S1) be related to the experimental setup, for example, the way water is added during the unlocking procedure? It would be helpful to include some speculation on whether the authors believe this preference to be endogenous or potentially a technical artifact.

      (4) The genotype of the larvae used for Figure 2 experiments is missing.

      (5) The experiment shown in Figure 2E-G reports the proportion of larvae exhibiting self-righting behavior. Is the self-righting speed comparable to that measured using the setup in Figure 1?

      (6) Line 496 states: "However, the effect size was smaller than that for the entire multidendritic population, suggesting neurons other than the daIVs are important for self-righting". Although I agree that this is the more parsimonious hypothesis, an alternative interpretation of the observed phenomenon could be that the effect is not due to the involvement of other neuronal populations, but rather to stronger Gal4 expression in daIVs with the general driver compared to the specific one. Have the authors (or someone else) measured or compared the relative strengths of these two drivers?

      (7) Is there a way to quantify or semi-quantify the expression of the Hox genes shown in Figure 6A? Also, was this experiment performed more than once (are there any technical replicates?), or was the amount of RNA material insufficient to allow replication?

      (8) Since RNAi constructs can sometimes produce off-target effects, it is generally advisable to use more than one RNAi line per gene, targeting different regions. Given that Hox genes have been extensively studied, the RNAis used in Figure 6B are likely already characterized. If this were the case, it would strengthen the data to mention it explicitly and provide references documenting the specificity and knockdown efficiency of the Hox gene RNAis employed. For example, does Antp RNAi expression in the 109(2)80 domain decrease Antp protein levels in multidendritic anterior neurons in immunofluorescence assays?

      (9) In addition to increasing self-righting time, does Antp downregulation also affect head casting behavior or head movement speed? A more detailed behavioral characterization of this genetic manipulation could help clarify how closely it relates to the behavioral phenotypes described in the previous experiments.

      (10) Does down-regulation of Antp in the daIV domain also increase self-righting time?

    4. Author response:

      We are very pleased to hear the overall positive views and constructive criticisms of eLife Editors and Reviewers on our work. In particular, we appreciate their global assessment that the work is important for understanding how body plan cues shape sensorimotor behavioural patterns, that the strength of evidence is solid, and their views that our experimental toolkit will be useful to others. We also very much appreciate eLife’s assessment that our findings will be of broad interest to researchers studying neural circuits, developmental genetics, and the evolution of behaviour.

      Regarding Reviewer 1, we thank them for their positive comments on the value of our study, highlighting that our paper addresses an important question using an elegant and innovative combination of methods, which leads to clear insights into the sensory biology of self-righting, which they consider shall be useful for others in the field. We are also very pleased to hear that they consider that our study makes a substantial contribution to understanding how animals correct their body position and that the manuscript is very clearly written and couched in interesting biology. In a revised version of the manuscript, we will consider some of the interesting points raised by Rev1, including the possibility of conducting new experiments using neuronal subset-specific Gal4s, to establish whether daIV sensory neurons are also acting in a regionally specific manner along the A-P axis.

      Turning to the comments by Rev2, we are grateful to them for considering that our experimental design is elegant, and that it introduces innovative methods that will likely benefit the fly behavior community, and the results are robustly supported. In connection to other comments, in a revised manuscript we will consider addressing the question of whether normal levels of expression of the Hox gene Antennapedia within the daIV domain are essential for self-righting. We will also seek to add technical replicates to our Hox expression molecular analysis, amend typos and incorporate several of the constructive corrections mentioned.

    1. eLife Assessment

      This important study uses single-neuron Patch-seq RNA sequencing to investigate the process by which RNA editing can produce protein diversity and regulate function in various cellular contexts. The computational analyses of the data collected are convincing, and from an analytical standpoint, this paper is a notable advance in seeking to provide a biological context for massive amounts of data in the field. The study would be of interest to biologists looking at the effects of RNA editing in the diversification of cellular behaviour.

    2. Reviewer #1 (Public review):

      The importance of RNA editing in producing protein diversity is a widespread process that can regulate how genes function in various cellular contexts. Despite the importance of the process, we still lack a thorough knowledge of the profile of RNA editing targets in known cells. Crane and colleagues take advantage of a recently acquired scRNAseq database for Drosophila type Ib and Is larval motoneurons and identify the RNA editing landscape that differs in those cells. They find both canonical (A --> I) and non-canonical sites and characterize the targets, their frequencies, and determine some of the "rules" that influence RNA editing. They compare their database with existing databases to determine a reliance on the most well-known deaminase enzyme ADAR, determine the activity-dependence of editing profiles, and identify editing sites that are specific to larval Drosophila, differing from adults. The authors also identify non-canonical editing sites, especially in the newly appreciated and identified regulator of synaptic plasticity, Arc1.

      The paper represents a strong analysis of recently made RNAseq databases from their lab and takes a notable approach to integrate this with other databases that have been recently produced from other sources. One of the places where this manuscript succeeds is in a thorough approach to analyzing the considerable amount of data that is out there regarding RNAseq in these differing motoneurons, but also in comparing larvae to adults. This is a strong advance. It also enables the authors to begin to determine rules for RNA editing. From an analytical standpoint, this paper is a notable advance in seeking to provide a biological context for massive amounts of data in the field. Further, it addresses some biological aspects in comparing WT and adar mutants to assess one potential deaminase, addresses activity-dependence, and begins to reveal profiles of canonical and non-canonical editing.

    3. Reviewer #2 (Public review):

      Summary:

      The study uses single-neuron Patch-seq RNA sequencing in two subgroups of Drosophila larval motoneurons (1s and 1b) and identifies 316 high-confidence canonical mRNA edit sites, which primarily (55%) occur in the coding regions of the mRNAs (CDS). Most of the canonical mRNA edits in the CDS regions include neuronal and synaptic proteins such as Complexin, Cac, Para, Shab, Sh, Slo, EndoA, Syx1A, Rim, RBP, Vap33, and Lap, which are involved in neuronal excitability and synaptic transmission. Of the 316 identified canonical edit sites, 60 lead to missense RNAs in a range of proteins (nAChRalpha5, nAChRalpha6, nAChRbeta1, ATPalpha, Cacophony, Para, Bsk, Beag, RNase Z) that are likely to have an impact on the larval motoneurons' development and function. Only 27 sites show editing levels higher than 90% and a similar editing profile is observed between the 1s and 1b motoneurons when looking at the number of edit sites and the fraction of reads edited per cell, with only 26 RNA editing sites showing a significant difference in the editing level. The variability of edited and unedited mRNAs suggests stochastic editing. The two subsets of motoneurons show many noncanonical editing sites, which, however, are not enriched for neuron-specific genes, therefore causing more silent changes compared to canonical editing sites. Comparison of the mRNA editing sites and editing rate of the single neuron Patch-seq RNA sequencing dataset to three other RNAseq datasets, one from same stage larval motoneurons and two from adult heads nuclei, show positive correlations in editing frequencies of CDS edits between the patch-sec larval 1b + 1s MNs and all other three datasets, with stronger correlations for previously annotated edits and weaker correlations for unannotated edits. Several of the identified editing targets are only present in the single neuron Patch-seq RNA sequencing dataset, suggesting cell-type-specific or developmental-specific editing. Editing appears to be resistant to changes in neuronal activity as only a few sites show evidence of being activity-regulated.

      Strengths:

      The study employs GAL4 driver lines available in the Drosophila model to identify two subtypes of motoneurons with distinct biophysical and morphological features. In combination with single-neuron Patch-seq RNA sequencing, it provides a unique opportunity to identify RNA editing sites and rates specific to specific motoneuron subtypes. The RNA seq data is robustly analysed, and high-confidence mRNA edit sites of both canonical and noncanonical RNA editing are identified.

      The mRNA editing sites identified from the single neuron Patch-seq RNA sequencing data are compared to editing sites identified across other RNAseq datasets collected from animals at similar or different developmental stages, allowing for the identification of editing sites that are common to all or specific to a single dataset.

      Weaknesses:

      Although the analysed motoneurons come from two distinct subtypes, it is unclear from how many Drosophila larvae the motoneurons were collected and from which specific regions along the ventral nerve cord (VNC). Therefore, the study does not consider possible differences in editing rate between samples from different larvae that could be in different active states or neurons located at different regions of the VNC, which would receive inputs from slightly different neuronal networks.

      The RNA samples include RNAs located both in the nucleus and the cytoplasm, introducing a potential compartmental mismatch between the RNA and the enzymes mediating the editing, which could influence editing rate. Similarly, the age of the RNAs undergoing editing is unknown, which may influence the measured editing rates.

    4. Reviewer #3 (Public review):

      Summary:

      The study consists of extensive computational analyses of their previously released Patch-seq data on single MN1-Ib and MNISN-Is neurons. The authors demonstrate the diversity of A>I editing events at single-cell resolution in two different neuronal cell types, identifying numerous A>I editing events that vary in their proportion, including those that cause missense mutations in conserved amino acids. They also consider "noncanonical" edits, such as C>T and G>A, and integrate publicly available data to support these analyses.

      In general, the study contains a valuable resource to assess RNA editing in single neurons and opens several questions regarding the diversity and functional implications of RNA editing at single-cell resolution. The conclusions from the study are generally supported by their data; however, the study is currently based on computational predictions and would therefore benefit from experimentation to support their hypotheses and demonstrate the effects of the editing events identified on neuronal function and phenotype.

      Strengths:

      The study uses samples that are technically difficult to prepare to assess cell-type-specific RNA editing events in a natural model. The study also uses public data from different developmental stages that demonstrate the importance of considering cell type and developmental stage-specific RNA regulation. These critical factors, particularly that of developmental timing, are often overlooked in mechanistic studies.

      Extensive computational analysis, using public pipelines, suitable filtering criteria, and accessible custom code, identifies a number of RNA editing events that have the potential to impact conserved amino acids and have subsequent effects on protein function. These observations are supported through the integration of several public data sets to investigate the occurrence of the edits in other data sets, with many identified across multiple data sets. This approach allowed the identification of a number of novel A>I edits, some of which appear to be specific to this study, suggesting cell/developmental specificity, whilst others are present in the public data sets but went unannotated.

      The study also considers the role of Adar in the generation of A>I edits, as would be expected, by assessing the effect of Adar expression on editing rates using public data from adar mutant tissue to demonstrate that the edits conserved between experiments are mainly Adar-sensitive. This would be stronger if the authors also performed Patch-seq experiments in adar mutants to increase confidence in the identified edit sites.

      Weaknesses:

      Whilst the study makes interesting observations using advanced computational approaches, it does not demonstrate the functional implications of the observed editing events. The functional impact of the edits is inferred from either the nature of the change to the coding sequence and the amino acid conservation, or through integration of other data sets. Although these could indeed imply function, further experimentation would be required to confirm such as using their Alphafold models to predict any changes in structure. This limitation is acknowledged by the authors, but the overall strength of the interpretation of the analysis could be softened to represent this.

      The study uses public data from more diverse cellular populations to confirm the role of Adar in introducing the A>I edits. Whilst this is convincing, the ideal comparison to support the mechanism behind the identified edits would be to perform patch-seq experiments on 1b or 1s neurons from adar mutants. However, although this should be considered when interpreting the data, these experiments would be a large amount of work and beyond the scope of the paper.

      By focusing on the potential impact of editing events that cause missense mutations in the CDS, the study may overlook the importance of edits in noncoding regions, which may impact miRNA or RNA-binding protein target sites. Further, the statement that noncanonical edits and those that induce silent mutations are likely to be less impactful is very broad and should be reconsidered. This is particularly the case when suggesting that silent mutations may not impact the biology. Given the importance of codon usage in translational fidelity, it is possible that silent mutations induced by either A>I or noncanonical editing in the CDS impact translation efficiency. Indeed, this could have a greater impact on protein production and transcript levels than a single amino acid change alone.

    5. Author response:

      Reviewer #1:

      Indicated the paper provided a strong analysis of RNAseq databases to provide a biological context and resource for the massive amounts of data in the field on RNA editing. The reviewer noted that future studies will be important to define the functional consequences of the individual edits and why the RNA editing rules we identified exist. We address these comments below.

      (1) The reviewer wondered about the role of noncanonical editing to neuronal protein expression.

      Indeed, the role of noncanonical editing has been poorly studied compared to the more common A-to-I ADAR-dependent editing. Most non-canonical coding edits we found actually caused silent changes at the amino acid level, suggesting evolutionary selection against this mechanism as a pathway for generating protein diversity. As such, we suspect that most of these edits are not altering neuronal function in significant ways. Two potential exceptions to this were non-canonical edits that altered conserved residues in the synaptic proteins Arc1 and Frequenin 1. The C-to-T coding edit in the activity-regulated Arc1 mRNA that encodes a retroviral-like Gag protein involved in synaptic plasticity resulted in a P124L amino acid change (see Author response image 1 panel A below). ~50% of total Arc1 mRNA was edited at this site in both Ib and Is neurons, suggesting a potentially important role if the P124L change alters Arc1 structure or function. Given Arc1 assembles into higher order viral-like capsids, this change could alter capsid formation or structure. Indeed, P124 lies in the hinge region separating the N- and C-terminal capsid assembly regions (panel B) and we hypothesize this change will alter the ability of Arc1 capsids to assemble properly. We plan to experimentally test this by rescuing Arc1 null mutants with edited versus unedited transgenes to see how the previously reported synaptic phenotypes are modified. We also plan to examine the ability of the change to alter Arc1 capsid assembly in a collaboration using CyroEM.

      Author response image 1.

      A. AlphaFold predictions of Drosophila Arc1 and Frq1 with edit site noted. B. Structure of the Drosophila Arc1 capsid. Monomeric Arc1 conformation within the capsid is shown on the right with the location of the edit site indicated.

      The other non-canonical edit (G-to-A) that stood out was in Frequenin 1 (Frq1), a multi-EF hand containing Ca<sup>2+</sup> binding protein that regulates synaptic transmission, that resulted in a G2E amino acid substitution (location within Frq1shown in panel A above). This glycine residue is conserved in all Frq homologs and is the site of N-myristoylation, a co-translational lipid modification to the glycine after removal of the initiator methionine by an aminopeptidase. Myristoylation tethers Frq proteins to the plasma membrane, with a Ca<sup>2+</sup>-myristoyl switch allowing some family members to cycle on and off membranes when the lipid domain is sequestered in the absence of Ca<sup>2+</sup>. Although the G2E edit is found at lower levels (20% in Ib MNs and 18% in Is MNs), it could create a pool of soluble Frq1 that alters it’s signaling. We plan to functionally assay the significance of this non-canonical edit as well. Compared to edits that alter amino acid sequence, determining how non canonical editing of UTRs might regulate mRNA dynamics is a harder question at this stage and will require more experimental follow-up.

      (2) The reviewer noted the last section of the results might be better split into multiple parts as it reads as a long combination of two thoughts.

      We agree with the reviewer that the last section is important, but it was disconnected a bit from the main story and was difficult for us to know exactly where to put it. All the data to that point in the paper was collected from our own PatchSeq analysis from individual larval motoneurons. We wanted to compare these results to other large RNAseq datasets obtained from pooled neuronal populations and felt it was best to include this at the end of the results section, as it no longer related to the rules of RNA editing within single neurons. We used these datasets to confirm many of our edits, as well as find evidence for some developmental and neuron-specific cell type edits. We also took advantage of RNAseq from neuronal datasets with altered activity to explore how activity might alter the editing machinery. We felt it better to include that data in this final section given it was not collected from our original PatchSeq approach.

      Reviewer #2:

      Noted the study provided a unique opportunity to identify RNA editing sites and rates specific to individual motoneuron subtypes, highlighting the RNAseq data was robustly analyzed and high-confidence hits were identified and compared to other RNAseq datasets. The reviewer provided some suggestions for future experiments and requested a few clarifications.

      (1) The reviewer asked about Figure 1F and the average editing rate per site described later in the paper.

      Indeed, Figure 1F shows the average editing rate for each individual gene for all the Ib and Is cells, so we primarily use that to highlight the variability we find in overall editing rate from around 20% for some sites to 100% for others. The actual editing rate for each site for individual neurons is shown in Figure 4D that plots the rate for every edit site and the overall sum rate for that neuron in particular.

      (2) The reviewer also noted that it was unclear where in the VNC the individual motoneurons were located and how that might affect editing.

      The precise segment of the larvae for every individual neuron that was sampled by Patch-seq was recorded and that data is accessible in the original Jetti et al 2023 paper if the reader wants to explore any potential anterior to posterior differences in RNA editing. Due to the technical difficulty of the Patch-seq approach, we pooled all the Ib and Is neurons from each segment together to get more statistical power to identify edit sites. We don’t believe segmental identify would be a major regulator of RNA editing, but cannot rule it out.

      (3) The reviewer also wondered if including RNAs located both in the nucleus and cytoplasm would influence editing rate.

      Given our Patch-seq approach requires us to extract both the cytoplasm and nucleus, we would be sampling both nuclear and cytoplasmic mRNAs. However, as shown in Figure 8 – figure supplement 3 D-F, the vast majority of our edits are found in both polyA mRNA samples and nascent nuclear mRNA samples from other datasets, indicating the editing is occurring co-transcriptionally and within the nucleus. As such, we don't think the inclusion of cytoplasmic mRNA is altering our measured editing rates for most sites. This may not be true for all non-canonical edits, as we did see some differences there, indicating some non-canonical editing may be happening in the cytoplasm as well.

      Reviewer #3:

      indicated the work provided a valuable resource to access RNA editing in single neurons. The reviewer suggested the value of future experiments to demonstrate the effects of editing events on neuronal function. This will be a major effort for us going forwards, as we indeed have already begun to test the role of editing in mRNAs encoding several presynaptic proteins that regulate synaptic transmission. The reviewer also had several other comments as discussed below.

      (1) The reviewer noted that silent mutations could alter codon usage that would result in translational stalling and altered protein production.

      This is an excellent point, as silent mutations in the coding region could have a more significant impact if they generate non-preferred rare codons. This is not something we have analyzed, but it certainly is worth considering in future experiments. Our initial efforts are on testing the edits that cause predictive changes in presynaptic proteins based on the amino acid change and their locale in important functional domains, but it is worth considering the silent edits as well as we think about the larger picture of how RNA editing is likely to impact not only protein function but also protein levels.

      (2) The reviewer noted future studies could be done using tools like Alphafold to test if the amino acid changes are predicted to alter the structure of proteins with coding edits.

      This is an interesting approach, though we don’t have much expertise in protein modeling at that level. We could consider adding this to future studies in collaboration with other modeling labs.

      (3) The reviewer wondered if the negative correlation between edits and transcript abundance could indicate edits might be destabilizing the transcripts.

      This is an interesting idea, but would need to be experimentally tested. For the few edits we have generated already to begin functionally testing, including our published work with editing in the C-terminus of Complexin, we haven’t seen a change in mRNA levels causes by these edits. However, it would not be surprising to see some edits reducing transcript levels. A set of 5’UTR edits we have generated in Syx1A seem to be reducing protein production and may be acting in such a manner.

      (4) The reviewer wondered if the proportion of edits we report in many of the figures is normalized to the length of the transcript, as longer transcripts might have more edits by chance.

      The figures referenced by the reviewer (1, 2 and 7) show the number of high-confidence editing sites that fall into the 5’ UTR, 3’ UTR, or CDS categories. Our intention here was to highlight that the majority of the high confidence edits that made it through the stringent filtering process were in the coding region. This would still be true if we normalized to the length of the given gene region. However, it would be interesting to know if these proportions match the expected proportions of edits in these gene regions given a random editing rate per gene region length across the Drosophila genome, although we did not do this analysis.    

      (5) The reviewer noted that future studies could expand on the work to examine miRNA or other known RBP binding sites that might be altered by the edits.

      This is another avenue we could pursue in the future. We did do this analysis for a few of the important genes encoding presynaptic proteins (these are the most interesting to us given the lab’s interest in the synaptic vesicle fusion machinery), but did not find anything obvious for this smaller subset of targets.

      (6) The reviewer suggested sequence context for Adar could also be investigated for the hits we identified.

      We haven’t pursued this avenue yet, but it would be of interest to do in the future. In a similar vein, it would be informative to identify intron-exon base pairing that could generate the dsDNA template on which ADAR acts.

      (7) The reviewer noted the disconnect between Adar mRNA levels and overall editing levels reported in Figure 4A/B.

      Indeed, the lack of correlation between overall editing levels and Adar mRNA abundance has been noted previously in many studies. For the type of single cell Patch-seq approach we took to generate our RNAseq libraries, the absolute amount of less abundant transcripts obtained from a single neuron can be very noisy. As such, the few neurons with no detectable Adar mRNA are likely to represent that single neuron noise in the sampling. Per the reviewer’s question, these figure panels only show A-to-I edits, so they are specific to ADAR.

      (8) The reviewer notes the scale in Figure 5D can make it hard to visualize the actual impact of the changes.

      The intention of Figure 5D was to address the question of whether sites with high Ib/Is editing differences were simply due to higher Ib or Is mRNA expression levels. If this was the case, then we would expect to see highly edited sites have large Ib/Is TPM differences. Instead, as the figure shows, the vast majority of highly-edited sites were in mRNAs that were NOT significantly different between Ib and Is (red dots in graph) and are therefore clustered together near “0 Difference in TPMs”. TPMs and editing levels for all edit sites can be found in Table 1, and a visualization of these data for selected sites is shown in Figure 5E.

    1. eLife Assessment

      This study provides useful insights into the ways in which germinal center B cell metabolism, particularly lipid metabolism, affects cellular responses. The authors use sophisticated mouse models to convincingly demonstrate that ether lipids are relevant for B cell homeostasis and efficient humoral responses. The authors then conducted in vivo as well as in vitro experiments, thereby strengthening their conclusions.

    2. Reviewer #1 (Public review):

      In this manuscript, Hoon Cho et al. present a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction.

      Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells.

      Comments on revisions:

      There are still some discrepancies in gating strategies. In Fig. 7B legend (lines 1082-1083), they show representative flow plots of GL7+ CD95+ GC B cells among viable B cells, so it is not clear if they are IgDneg, as the rest of the GC B cells aforementioned in the text.

      Western blot confirmation: We understand the limitations the authors enumerate. Perhaps an RT-qPCR analysis of the Dhrs7b gene in sorted GC B cells from the S1PR2-CreERT2 model could be feasible, as it requires a smaller number of cells. In any case, we agree with the authors that the results obtained using the huCD20-CreERT2 model are consistent with those from the S1PR2-CreERT2 model, which adds credibility to the findings and supports the conclusion that GC B cells in the S1PR2-CreERT2 model are indeed deficient in PexRAP

      Lines 222-226: We believe the correct figure is 4B, whereas the text refers to 4C.

      Supplementary Figure 1 (line 1147): The figure title suggests that the data on T-cell numbers are from mice in a steady state. However, the legend indicates that the mice were immunized, which means the data are not from steady-state conditions.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In minor part, there are issues for the interpretation of the data which might cause confusions by readers.

      Comments on revisions:

      The authors improved the manuscript appropriately according to my comments.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      In this manuscript, Hoon Cho et al. present a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction. Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells. 

      Comments on revisions:

      There are still some discrepancies in gating strategies. In Fig. 7B legend (lines 1082-1083), they show representative flow plots of GL7+ CD95+ GC B cells among viable B cells, so it is not clear if they are IgDneg, as the rest of the GC B cells aforementioned in the text.

      We apologize for missing this item in need of correction in the revision and sincerely thank the reviewer for the stamina and care in picking this up. The data shown in Fig. 7B represented cells (events) in the IgD<sup>neg</sup> Dump<sup>neg</sup> viable lymphoid gate. We will correct this omission/blemish in the final revision that becomes the version of record.

      Western blot confirmation: We understand the limitations the authors enumerate. Perhaps an RT-qPCR analysis of the Dhrs7b gene in sorted GC B cells from the S1PR2-CreERT2 model could be feasible, as it requires a smaller number of cells. In any case, we agree with the authors that the results obtained using the huCD20-CreERT2 model are consistent with those from the S1PR2-CreERT2 model, which adds credibility to the findings and supports the conclusion that GC B cells in the S1PR2-CreERT2 model are indeed deficient in PexRAP.

      We will make efforts to go back through the manuscript and highlight this limitation to readers, i.e., that we were unable to get genetic evidence to assess what degree of "counter-selection" applied to GC B cells in our experiments.

      We agree with the referee that optimally to support the Imaging Mass Spectrometry (IMS) data showing perturbations of various ether lipids within GC after depletion of PexRAP, it would have been best if we could have had a qRT2-PCR that allowed quantitation of the Dhrs7b-encoded mRNA in flow-purified GC B cells, or the extent to which the genomic DNA of these cells was in deleted rather than 'floxed' configuration.

      While the short half-life of ether lipid species leads us to infer that the enzymatic function remains reduced/absent, it definitely is unsatisfying that the money for experiments ran out in June and the lab members had to move to new jobs.

      Lines 222-226: We believe the correct figure is 4B, whereas the text refers to 4C.

      As for the 1st item, we apologize and will correct this error.

      Supplementary Figure 1 (line 1147): The figure title suggests that the data on T-cell numbers are from mice in a steady state. However, the legend indicates that the mice were immunized, which means the data are not from steady-state conditions. 

      We will change the wording both on line 1147 and 1152.

      Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In minor part, there are issues for the interpretation of the data which might cause confusions by readers.

      Comments on revisions:

      The authors improved the manuscript appropriately according to my comments.

      To re-summarize, we very much appreciate the diligence of the referees and Editors in re-reviewing this work at each cycle and helping via constructive peer review, along with their favorable comments and overall assessments. The final points will be addressed with minor edits since there no longer is any money for further work and the lab people have moved on.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In this manuscript, Sung Hoon Cho et al. presents a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction. 

      Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells.

      We appreciate these positive reactions and response, and agree with the overview and summary of the paper's approaches and strengths.

      However, several major points need to be addressed:

      Major Comments:

      Figures 1 and 2

      The authors conclude, based on the results from these two figures, that PexRAP promotes the homeostatic maintenance and proliferation of B cells. In this section, the authors first use a tamoxifen-inducible full Dhrs7b knockout (KO) and afterwards Dhrs7bΔ/Δ-B model to specifically characterize the role of this molecule in B cells. They characterize the B and T cell compartments using flow cytometry (FACS) and examine the establishment of the GC reaction using FACS and immunofluorescence. They conclude that B cell numbers are reduced, and the GC reaction is defective upon stimulation, showing a reduction in the total percentage of GC cells, particularly in the light zone (LZ).

      The analysis of the steady-state B cell compartment should also be improved. This includes a  more detailed characterization of MZ and B1 populations, given the role of lipid metabolism and lipid peroxidation in these subtypes.

      Suggestions for Improvement:

      B Cell compartment characterization: A deeper characterization of the B cell compartment in non-immunized mice is needed, including analysis of Marginal Zone (MZ) maturation and a more detailed examination of the B1 compartment. This is especially important given the role of specific lipid metabolism in these cell types. The phenotyping of the B cell compartment should also include an analysis of immunoglobulin levels on the membrane, considering the impact of lipids on membrane composition.

      Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we believe we will be able to polish a revised manuscript through addition of results of analyses suggested by this point in the review: measurement of surface IgM on and phenotyping of various B cell subsets, including MZB and B1 B cells, to extend the data in Supplemental Fig 1H and I. Depending on the level of support, new immunization experiments to score Tfh and analyze a few of their functional molecules as part of a B cell paper may be feasible.   

      Addendum / update of Sept 2025: We added new data with more on MZB and B1 B cells, surface IgM, and on Tfh populations. 

      GC Response Analysis Upon Immunization: The GC response characterization should include additional data on the T cell compartment, specifically the presence and function of Tfh cells. In Fig. 1H, the distribution of the LZ appears strikingly different. However, the authors have not addressed this in the text. A more thorough characterization of centroblasts and centrocytes using CXCR4 and CD86 markers is needed.

      The gating strategy used to characterize GC cells (GL7+CD95+ in IgD− cells) is suboptimal. A more robust analysis of GC cells should be performed in total B220+CD138− cells.

      We first want to apologize the mislabeling of LZ and DZ in Fig 1H. The greenish-yellow colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicate the DZ and the cyan-colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicates the LZ.    Addendum / update of Sept 2025: We corrected the mistake, and added new experimental data using the CD138 marker to exclude preplasmablasts.  

      As a technical note, we experienced high background noise with GL7 staining uniquely with PexRAP deficient (Dhrs7b<sup>f/f</sup>; Rosa26-CreER<sup>T2</sup>) mice (i.e., not WT control mice). The high background noise of GL7 staining was not observed in B cell specific KO of PexRAP (Dhrs7b<sup>f/f</sup>; huCD20-CreER<sup>T2</sup>). Two formal possibilities to account for this staining issue would be if either the expression of the GL7 epitope were repressed by PexRAP or the proper positioning of GL7<sup>+</sup> cells in germinal center region were defective in PexRAPdeficient mice (e.g., due to an effect on positioning cues from cell types other than B cells). In a revised manuscript, we will fix the labeling error and further discuss the GL7 issue, while taking care not to be thought to conclude that there is a positioning problem or derepression of GL7 (an activation antigen on T cells as well as B cells).

      While the gating strategy for an overall population of GC B cells is fairly standard even in the current literature, the question about using CD138 staining to exclude early plasmablasts (i.e., analyze B220<sup>+</sup> CD138<sup>neg</sup> vs B220<sup>+</sup> CD138<sup>+</sup>) is interesting. In addition, some papers like to use GL7<sup>+</sup> CD38<sup>neg</sup> for GC B cells instead of GL7<sup>+</sup> Fas (CD95)<sup>+</sup>, and we thank the reviewer for suggesting the analysis of centroblasts and centrocytes. For the revision, we will try to secure resources to revisit the immunizations and analyze them for these other facets of GC B cells (including CXCR4/CD86) and for their GL7<sup>+</sup> CD38<sup>neg</sup>. B220<sup>+</sup> CD138<sup>-</sup> and B220<sup>+</sup> CD138<sup>+</sup> cell populations. 

      We agree that comparison of the Rosa26-CreERT2 results to those with B cell-specific lossof-function raise a tantalizing possibility that Tfh cells also are influenced by PexRAP. Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we hope to add a new immunization experiments that scores Tfh and analyzes a few of their functional molecules could be added to this B cell paper, depending on the ability to wheedle enough support / fiscal resources.  

      Addendum / update of Sept 2025: Within the tight time until lab closure, and limited $$, we were able to do experiments that further reinforced the GC B cell data - including stains for DZ vs LZ sub-subsetting - and analyzed Tfh cells. We were not able to explore changes in functional antigenic markers on the GC B or Tfh cells. 

      The authors claim that Dhrs7b supports the homeostatic maintenance of quiescent B cells in vivo and promotes effective proliferation. This conclusion is primarily based on experiments where CTV-labeled PexRAP-deficient B cells were adoptively transferred into μMT mice (Fig. 2D-F). However, we recommend reviewing the flow plots of CTV in Fig. 2E, as they appear out of scale. More importantly, the low recovery of PexRAP-deficient B cells post-adoptive transfer weakens the robustness of the results and is insufficient to conclusively support the role of PexRAP in B cell proliferation in vivo.

      In the revision, we will edit the text and try to adjust the digitized cytometry data to allow more dynamic range to the right side of the upper panels in Fig. 2E, and otherwise to improve the presentation of the in vivo CTV result. However, we feel impelled to push back respectfully on some of the concern raised here. First, it seems to gloss over the presentation of multiple facets of evidence. The conclusion about maintenance derives primarily from Fig. 2C, which shows a rapid, statistically significant decrease in B cell numbers (extending the finding of Fig. 1D, a more substantial decrease after a bit longer a period). As noted in the text, the rate of de novo B cell production does not suffice to explain the magnitude of the decrease. 

      In terms of proliferation, we will improve presentation of the Methods but the bottom line is that the recovery efficiency is not bad (comparing to prior published work) inasmuch as transferred B cells do not uniformly home to spleen. In a setting where BAFF is in ample supply in vivo, we transferred equal numbers of cells that were equally labeled with CTV and counted B cells. The CTV result might be affected by lower recovered B cell with PexRAP deficiency, generally, the frequencies of CTV<sup>low</sup> divided population are not changed very much. However, it is precisely because of the pitfalls of in vivo analyses that we included complementary data with survival and proliferation in vitro. The proliferation was attenuated in PexRAP-deficient B cells in vitro; this evidence supports the conclusion that proliferation of PexRAP knockout B cells is reduced. It is likely that PexRAP deficient B cells also have defect in viability in vivo as we observed the reduced B cell number in PexRAP-deficient mice. As the reviewer noticed, the presence of a defect in cycling does, in the transfer experiments, limit the ability to interpret a lower yield of B cell population after adoptive transfer into µMT recipient mice as evidence pertaining to death rates. We will edit the text of the revision with these points in mind. 

      In vitro stimulation experiments: These experiments need improvement. The authors have used anti-CD40 and BAFF for B cell stimulation; however, it would be beneficial to also include antiIgM in the stimulation cocktail. In Fig. 2G, CTV plots do not show clear defects in proliferation, yet the authors quantify the percentage of cells with more than three divisions. These plots should clearly display the gating strategy. Additionally, details about histogram normalization and potential defects in cell numbers are missing. A more in-depth analysis of apoptosis is also required to determine whether the observed defects are due to impaired proliferation or reduced survival. 

      As suggested by reviewer, testing additional forms of B cell activation can help explore the generality (or lack thereof) of findings. We plan to test anti-IgM stimulation together with anti-CD40 + BAFF as well as anti-IgM + TLR7/8, and add the data to a revised and final manuscript. 

      Addendum / update of Sept 2025: The revision includes results of new experiments in which anti-IgM was included in the stimulation cocktail, as well as further data on apoptosis and distinguishing impaired cycling / divisions from reduced survival .

      With regards to Fig. 2G (and 2H), in the revised manuscript we will refine the presentation (add a demonstration of the gating, and explicate histogram normalization of FlowJo). 

      It is an interesting issue in bioscience, but in our presentation 'representative data' really are pretty representative, so a senior author is reminded of a comment Tak Mak made about a reduction (of proliferation, if memory serves) to 0.7 x control. [His point in a comment to referees at a symposium related that to a salary reduction by 30% :) A mathematical alternative is to point out that across four rounds of division for WT cells, a reduction to  0.7x efficiency at each cycle means about 1/4 as many progeny.] 

      We will try to edit the revision (Methods, Legends, Results, Discussion] to address better the points of the last two sentences of the comment, and improve the details that could assist in replication or comparisons (e.g., if someone develops a PexRAP inhibitor as potential therapeutic). 

      For the present, please note that the cell numbers at the end of the cultures are currently shown in Fig 2, panel I. Analogous culture results are shown in Fig 8, panels I, J, albeit with harvesting at day 5 instead of day 4. So, a difference of ≥ 3x needs to be explained. As noted above, a division efficiency reduced to 0.7x normal might account for such a decrease, but in practice the data of Fig. 2I show that the number of PexRAP-deficient B cells at day 4 is similar to the number plated before activation, and yet there has been a reasonable amount of divisions. So cell numbers in the culture of mutant B cells are constant because cycling is active but decreased and insufficient to allow increased numbers ("proliferation" in the true sense) as programmed death is increased. In line with this evidence, Fig 8G-H document higher death rates [i.e., frequencies of cleaved caspase3<sup>+</sup> cell and Annexin V<sup>+</sup> cells] of PexRAP-deficient B cells compared to controls. Thus, the in vitro data lead to the conclusion that both decreased division rates and increased death operate after this form of stimulation. 

      An inference is that this is the case in vivo as well - note that recoveries differed by ~3x (Fig. 2D), and the decrease in divisions (presentation of which will be improved) was meaningful but of lesser magnitude (Fig. 2E, F). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      We appreciate this positive response and agree with the overview and summary of the paper's approaches and strengths. 

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In the minor part, there are issues with the interpretation of the data which might cause confusion for the readers.

      Issues about contributions of cell cycling and divisions on the one hand, and susceptibility to death on the other, were discussed above, amplifying on the current manuscript text. The aggregate data support a model in which both processes are impacted for mature B cells in general, and mechanistically the evidence and work focus on the increased ROS and modes of death. Although the data in Fig. 7 do provide evidence that GC B cells themselves are affected, we agree that resource limitations had militated against developing further evidence about cycling specifically for GC B cells. We will hope to be able to obtain sufficient data from some specific analysis of proliferation in vivo (e.g., Ki67 or BrdU) as well as ROS and death ex vivo when harvesting new samples from mice immunized to analyze GC B cells for CXCR4/CD86, CD38, CD138 as indicated by Reviewer 1. As suggested by Reviewer 2, we will further discuss the possible mechanism(s) by which proliferation of PexRAP-deficient B cells is impaired. We also will edit the text of a revision where to enhance clarity of data interpretation - at a minimum, to be very clear that caution is warranted in assuming that GC B cells will exhibit the same mechanisms as cultures in vitro-stimulated B cells. 

      Addendum / update of Sept 2025: We were able to obtain results of intravital BrdU incorporation into GC B cells to measure cell cycling rates. The revised manuscript includes these results as well as other new data on apoptosis / survival, while deleting the data about CD138 populations whose interpretation was reasonably questioned by the referees.  

      Reviewer #1 (Recommendations for the authors):

      We believe the evidence presented to support the role of PexRAP in protecting B cells from cell death and promoting B cell proliferation is not sufficiently robust and requires further validation in vivo. While the study demonstrates an increase in ether lipid content within the GC compartment, it also highlights a reduction in mature B cells in PexRAP-deficient mice under steady-state conditions. However, the IMS results (Fig. 3A) indicate that there are no significant differences in ether lipid content in the naïve B cell population. This discrepancy raises an intriguing point for discussion: why is PexRAP critical for B cell survival under steady-state conditions?

      We thank the referee for all their care and input, and we agree that further intravital analyses could strengthen the work by providing more direct evidence of impairment of GC B cells in vivo. To revise and improve this manuscript before creation of a contribution of record, we performed new experiments to the limit of available funds and have both (i) added these new data and (ii) sharpened the presentation to correct what we believe to be one inaccurate point raised in the review. 

      (A) Specifically, we immunized mice with a B cell-specific depletion of PexRAP (Dhrs7b<sup>D/D-B</sup> mice) and measured a variety of readouts of the GC B cells' physiology in vivo: proliferation by intravital incorporation of BrdU, ROS in the viable GC B cell gate, and their cell death by annexin V staining directly ex vivo. Consistent with the data with in vitro activated B cells, these analyses showed increased ROS (new - Fig. 7D) and higher frequencies of Annexin V<sup>+</sup> 7AAD<sup>+</sup> in GC B cells (GL7<sup>+</sup> CD38<sup>-</sup> B cell-gate) of immunized Dhrs7b<sup>D/D-B</sup> mice compared with WT controls (huCD20-CreERT2<sup>+/-</sup>, Dhrs7b<sup>+/+</sup>)  (new - Fig. 7E). Collectively, these results indicate that PexRAP aids (directly or indirectly) in controlling ROS in GC B cells and reduces B cell death, likely contributing to the substantially decreased overall GC B cell population. These new data are added to the revised manuscript in Figure 7.  

      Moreover, in each of two independent experiments (each comprising 3 vs 3 immunized mice), BrdU<sup>+</sup> events among GL7<sup>+</sup> CD38<sup>-</sup> (GC B cell)-gated cells were reduced in the B cell-specific PexRAP knockouts compared with WT controls (new, Fig. 7F and Supplemental Fig 6E). This result on cell cycle rates in vivo is presented with caution in the revised manuscript text because the absolute labeling fractions were somewhat different in Expt 1 vs Expt 2. This situation affords a useful opportunity to comment on the culture of "P values" and statistical methods. It is intriguing to consider how many successful drugs are based on research published back when the standard was to interpret a result of this sort more definitively despite a merged "P value" that was not a full 2 SD different from the mean. In the optimistic spirit of the eLife model, it can be for the attentive reader to decide from the data (new, Fig. 7F and Supplemental Fig 6E) whether to interpret the BrdU results more strongly that what we state in the revised text.  

      (B) On the issue of whether or not the loss of PexRAP led to perturbations of the lipidome of B cells prior to activation, we have edited the manuscript to do a better job making this point more clear.  

      We point out to readers that in the resting, pre-activation state abnormalities were detected in naive B cells, not just in activated and GC B cells. In brief, the IMS analysis and LC-MS-MS analysis detected statistically significant differences in some, but not all, the ether phospholipids species in PexRAP deficient cells (some of which was in Supplemental Figure 2 of the original version). 

      With this appropriate and helpful concern having been raised, we realize that this important point merited inclusion in the main figures. We point specifically to a set of phosphatidyl choline ions shown in Fig. 3 (revised - panels A, B, D) of the revised manuscript (PC O-36:5; PC O-38:5; PC O-40:6 and -40:7). 

      For this ancillary record (because a discourse on the limitations of each analysis), we will note issues such as the presence of many non-B cells in each pixel of the IMS analyses (so that some or many "true positives" will fail to achieve a "significant difference") and for the naive B cells, differential rates of synthesis, turnover, and conversion (e.g., addition of another 2-carbon unit or saturation / desaturation of one side-chain). To the extent the concern reflects some surprise and perhaps skepticism that what seem relatively limited differences (many species appear unaffected, etc), we share in the sentiment. But the basic observation is that there are differences, and a reasonable connection between the altered lipid profile and evidence of effects on survival or proliferation (i.e., integration of survival and cell cycling / division). 

      Additionally, it would be valuable to evaluate the humoral response in a T-independent setting. This would clarify whether the role of PexRAP is restricted to GC B cells or extends to activated B cells in general. 

      We agree that this additional set of experiments would be nice and would extend work incrementally by testing the generality of the findings about Ab responses. The practical problem is that money and time ran out while testing important items that strengthen the evidence about GC B cells. 

      Finally, the manuscript would benefit from a thorough revision to improve its readability and clarity. Including more detailed descriptions of technical aspects, such as the specific stimuli and time points used in analyses, would greatly enhance the flow and comprehension of the study. Furthermore, the authors should review figure labeling to ensure consistency throughout the manuscript, and carefully cite the relevant references. For instance, S1PR2 CreERT2 mouse is established by Okada and Kurosaki (Shinnakasu et al ,Nat. Immunol, 2016)

      We appreciate this feedback and comment, inasmuch as both the clarity and scholarship matter greatly to us for a final item of record. For the revision, we have given our best shot to editing the text in the hopes of improved clarity, reduction of discrepancies (helpfully noted in the Minor Comments), and further detail-rich descriptions of procedures. We also edited the figure labeling to give a better consistency. While we note that the appropriate citation of Shinnakasu et al (2016) was ref. #69 of the original and remains as a citation, we have rechecked other referencing and try to use citations with the best relevant references.  

      Minor Comments: The labeling of plots in Fig. 2 should be standardized. For example, in Fig. 2C, D, and G, the same mouse strain is used, yet the Cre+ mouse is labeled differently in each plot. 

      We agree and have tried to tighten up these features in the panels noted as well as more generally (e.g., Fig. 4, 5, 6, 7, 9; consistency of huCD20-CreERT2 / hCD20CreERT2).

      According to the text, the results shown in Fig. 1G and H correspond to a full KO  (Dhrs7b^f/f; Rosa26-CreERT2 mice). However, Fig. 1H indicates that the bottom image corresponds to Dhrs7b^f/f, huCD20-CreERT2 mice (Dhrs7bΔ/Δ -B). 

      We have corrected Fig. 1H to be labeled as Dhrs7b<sup>Δ/Δ</sup> (with the data on Dhrs7b<sup>Δ/Δ-B</sup> presented in Supplemental Figure 4A, which is correctly labeled). Thank you for picking up this error that crept in while using copy/paste in preparation of figure panels and failing to edit out the "-B"!  

      Similarly, the gating strategy for GC cells in the text mentions IgD− cells, while the figure legend refers to total viable B cells. These discrepancies need clarification.

      We believe we located and have corrected this issue in the revised manuscript.   

      Figures 3 and 4. The authors claim that B cell expression of PexRAP is required to  achieve normal concentrations of ether phospholipids. 

      Suggestions for Improvement: 

      Lipid Metabolism Analysis: The analysis in Fig. 3 is generally convincing but could be strengthened by including an additional stimulation condition such as anti-IgM plus antiCD40. In Fig. 4C, the authors display results from the full KO model. It would be helpful to include quantitative graphs summarizing the parameters displayed in the images.

      We have performed new experiments (anti-IgM + anti-CD40) and added the data to the revised manuscript (new - Supplemental Fig. 2H and Supplemental Fig 6, D & F). Conclusions based on the effects are not changed from the original. 

      As a semantic comment and point of scientific process, any interpretation ("claim") can - by definition - only be taken to apply to the conditions of the experiment. Nonetheless, it is inescapable that at least for some ether P-lipids of naive, resting B cells, and for substantially more in B cells activated under the conditions that we outline, B cell expression of PexRAP is required. 

      With regards to the constructive suggestion about a new series of lipidomic analyses, we agree that for activated B cells it would be nice and increase insight into the spectrum of conditions under which the PexRAP-deficient B cells had altered content of ether phospholipids. However, in light of the costs of metabolomic analyses and the lack of funds to support further experiments, and the accuracy of the point as stated, we prioritized the experiments that could fit within the severely limited budget. 

      [One can add that our results provide a premise for later work to analyze a time course after activation, and to perform isotopomer (SIRM) analyses with [13] C-labeled acetate or glucose, so as to understand activation-induced increases in the overall   To revise the manuscript, we did however extrapolate from the point about adding BCR cross-linking to anti-CD40 as a variant form of activating the B cells for measurements of ROS, population growth, and rates of division (CTV partitioning). The results of these analyses, which align with and thereby strengthen the conclusions about these functional features from experiments with anti-CD40 but no anti-IgM, are added to Supplemental Fig 2H and Supplemental Fig 6D, F. 

      Figures 5, 6, and 7

      The authors claim that Dhrs7b in B cells shapes antibody affinity and quantity. They use two mouse models for this analysis: huCD20-CreERT2 and Dhrs7b f/f; S1pr2-CreERT2 mice. 

      Suggestions for Improvement:

      Adaptive immune response characterization: A more comprehensive characterization of the adaptive immune response is needed, ideally using the Dhrs7b f/f; S1pr2-CreERT2 model. This should include: Analysis of the GC response in B220+CD138− cells. Class switch recombination analysis. A detailed characterization of centroblasts, centrocytes, and Tfh populations. Characterization of effector cells (plasma cells and memory cells).

      Within the limits of time and money, we have performed new experiments prompted by this constructive set of suggestions. 

      Specifically, we analyzed the suggested read-outs in the huCD20-CreERT2, Dhrs7b<sup>f/f</sup> model after immunization, recognizing that it trades greater signal-noise for the fact that effects are due to a mix of the impact on B cells during clonal expansion before GC recruitment and activities within the GC. In brief, the results showed that 

      (a) the GC B cell population - defined as CD138<sup>neg</sup> GL7<sup>+</sup> CD38<sup>lo/neg</sup> IgD<sup>neg</sup> B cells - was about half as large for PexRAP-deficient B cells net of any early- or preplasmablasts (CD138<sup>+</sup> events) (new - Fig 5G); 

      (b) the frequencies of pre- / early plasmablasts (CD138<sup>+</sup> GL7<sup>+</sup> CD38<sup>neg</sup>) events (see new - Fig. 6H, I; also, new Supplemental Fig 5D) were so low as to make it unlikely that our data with the S1pr2-CreERT2 model (in Fig 7B, C) would be affected meaningfully by analysis of the CD138 levels;

      (c) There was a modest decrease in centrocytes (LZ) but not centroblasts (DZ) (new - Fig 5H, I) - consistent with the immunohistochemical data of Supplemental Fig. 5A-C). 

      Because of time limitations (the "shelf life" of funds and the lab) and insufficient stock of the S1pr2-CreERT2, Dhrs7b<sup>f/f</sup> mice as well as those that would be needed as adoptive transfer recipients because of S1PR2 expression in (GC-)Tfh, the experiments were performed instead with the huCD20-CreERT2, Dhrs7b<sup>f/f</sup> model. We would also note that using this Cre transgene better harmonizes the centrocyte/centroblast and Tfh data with the existing data on these points in Supplemental Fig. 4. 

      (d) Of note, the analyses of Tfh and GC-Tfh phenotype cells using the huCD20-CreERT2 B cell type-specific inducible Cre system to inactivate Dhrs7b (new - Supplemental Fig 1G-I; which, along with new - Supplemental Fig 5E) provide evidence of an abnormality that must stem from a function or functions of PexRAP in B cells, most likely GC B cells. Specifically, it is known that the GC-Tfh population proliferates and is supported by the GC B cells, and the results of B cell-specific deletion show substantial reductions in Tfh cells (both the GC-Tfh gating and the wider gate for plots of CXCR5/PD-1/ fluorescence of CD4 T cells 

      Timepoint Consistency: The NP response (Fig. 5) is analyzed four weeks postimmunization, whereas SRBC (Supp. Fig. 4) and Fig. 7 are analyzed one week or nine days post-immunization. The NP system analysis should be repeated at shorter timepoints to match the peak GC reaction.

      This comment may stem from a misunderstanding. As diagrammed in Fig. 5A, the experiments involving the NP system were in fact measured at 7 d after a secondary (booster) immunization. That timing is approximately the peak period and harmonizes with the 7 d used for harvesting SRBC-immunized mice. So in fact the data with each system were obtained at a similar time point. Of course the NP experiments involved a second immunization so that many plasma cell and Ab responses derived from memory B cells generated by the primary immunization. However, the field at present is dominated by the view that the vast majority of the GC B cells after this second immunization (which historically we perform with alum adjuvant) are recruited from the naive rather than the memory B cell pool. For the revised manuscript, we have taken care that the Methods, Legend, and Figure provide the information to readers, and expanded the statement of a rationale. 

      It may seem a technicality but under NIH regulations we are legally obligated to try to minimize mouse usage. It also behooves researchers to use funds wisely. In line with those imperatives, we used systems that would simultaneously allow analyses of GC B cells, identification of affinity maturation (which is minimal in our hands at a 7 d time point after primary NP-carrier immunization), and a switched repertoire (also minimal), and where with each immunogen the GC were scored at 7-9 d after immunization (9 d refers to the S1pr2-CreERT2 experiments). Apart from the end of funding, we feel that what little might be learned from performing a series of experiments that involve harvests 7 d after a primary immunization with NP-ovalbumin cannot well be justified. 

      In vitro plasma cell differentiation: Quantification is missing for plasma cell differentiation in vitro (Supp. Fig. 4). The stimulus used should also be specified in the figure legend. Given the use of anti-CD40, differentiation towards IgG1 plasma cells could provide additional insights.

      As suggested by reviewer, we have added the results of quantifying the in vitro plasma cell differentiation in Supplemental Fig 6B. Also, we edited the Methods and Supplemental Figure Legend to give detailed information of in vitro stimulation. 

      Proliferation and apoptosis analysis: The observed defects in the humoral response should be correlated with proliferation and apoptosis analyses, including Ki67 and Caspase markers.

      As suggested by the review, we have performed new experiment and analyzed the frequencies of cell death by annexin V staining, and elected to use intravital uptake of BrdU as a more direct measurement of S phase / cell cycling component of net proliferation. The new results are now displayed in Figure 5 and Supplemental Fig. 5. 

      Western blot confirmation: While the authors have demonstrated the absence of PexRAP protein in the huCD20-CreERT2 model, this has not been shown in GC B cells from the Dhrs7b f/f; S1pr2-CreERT2 model. This confirmation is necessary to validate the efficiency of Dhrs7b deletion.

      We were unable to do this for technical reasons expanded on below. For the revision, we have edited in a bit of text more explicitly to alert readers to the potential impact of counter-selection on interpretation of the findings with GC B cells. Before entering the GC, B cells have undergone many divisions, so if there were major pre-GC counterselection, in all likelihood the GC B cells would PexRAP-sufficient. To recap from the original manuscript and the new data we have added, IMS shows altered lipid profiles in the GC B cells and the literature indicates that the lipids are short-lived, requiring de novo resynthesis. The BrdU, ROS, and annexin V data show that GC B cells are abnormal. Accordingly, abnormal GC B cells represent the parsimonious or straightforward interpretation of the new results with GC-Tfh cell prevalence. 

      While we take these findings together to suggest that counterselection (i.e., a Western result showing normal levels of PexRAP in the GC B cells) seems unlikely, it is formally possible and would mean that the in situ defects of GC B cells arose due to environmental influences of the PexRAP-deficient B cells during the developmental history of the WT B cells observed in the GC. 

      Having noted all that, we understand that concerns about counter-selection are an issue if a reader accepts the data showing that mutant (PexRAP-deficient) B cells tend to proliferate less and die more readily. Indeed, one can speculate that were we also to perform competition experiments in which the Ighb, Cd45.2 B cells (WT or Dhrs7b D/D) are mixed with equal numbers of Igha, Cd45.1 competitors, the differences would become much greater. With this in mind, Western blotting of flow-purified GC B cells might give a sense of how much counter-selection has occurred. 

      That said, the Westerns need at least 2.5 x 10<sup>6</sup> B cells (those in the manuscript used five million, 5  x 10<sup>6</sup>) and would need replication. Taken together with the observation that ~200,000 GC B cells (on average) were measured in each B cell-specific knockout mouse after immunization (Fig. 1, Fig 5) and taking into account yields from sorting, each Western would require some 20-25 tamoxifen-injected ___-CreERT2, Dhrs7b f/f mice, and about half again that number as controls. The expiry of funds prohibited the time and costs of generating that many mice (>70) and flow-purified GC B cells. 

      Figure 8

      The authors claim that Dhrs7b contributes to the modulation of ROS, impacting B cell proliferation.

      Suggestions for Improvement:

      GC ROS Analysis: The in vitro ROS analysis should be complemented by characterizing ROS and lipid peroxidation in the GC response using the Dhrs7b f/f; S1pr2-CreERT2 model. Flow cytometry staining with H2DCFDA, MitoSOX, Caspase-3, and Annexin V would allow assessment of ROS levels and cell death in GC B cells. 

      While subject to some of the same practical limits noted above, we have performed new experiments in line with this helpful input of the reviewer, and added the helpful new data to the revised manuscript. Specifically, in addition to the BrdU and phenotyping analyses after immunization of huCD20-CreER<sup>T2</sup>, Dhrs7b<sup>f/f</sup> mice, DCFDA (ROS), MitoSox, and annexin V signals were measured for GC B cells. Although the mitoSox signals did not significantly differ for PexRAP-deficient GCB, the ROS and annexin V signals were substantially increased. We added the new data to Figure 5 and Supplemental Figure 5. Together with the decreased in vivo BrdU incorporation in GC B cells from Dhrs7b<sup>D/D-B</sup> mice, these results are consistent with and support our hypothesis that PexRAP regulates B cell population growth and GC physiology in part by regulating ROS detoxification, survival and proliferation of B cells.  

      Quantification is missing in Fig. 8E, and Fig. 8F should use clearer symbols for better readability. 

      We added quantification for Fig 8E in Supplemental Fig 6E, and edited the symbols in Fig 8F for better readability.

      Figure 9

      The authors claim that Dhrs7b in B cells affects oxidative metabolism and ER mass. The  results in this section are well-performed and convincing.

      Suggestion for Improvement:

      Based on the results, the discussion should elaborate on the potential role of lipids in antigen presentation, considering their impact on mitochondria and ER function.

      We very much appreciate the praise of the tantalizing findings about oxidative metabolism and ER mass, and will accept the encouragement that we add (prudently) to the Discussion section to make note of the points mentioned by the Reviewer, particularly now that (with their encouragement) we have the evidence that B cell-specific loss of PexRAP (with the huCD20-CreERT2 deletion prior to immunization) resulted in decreased (GC-)Tfh and somewhat lower GC B cell proliferation.  

      Reviewer #2 (Recommendations for the authors):

      The authors should investigate whether PexRAP-deficient GC B cells exhibit increased mitochondrial ROS and cell death ex vivo, as observed in in vitro cultured B cells.

      We very much appreciate the work of the referee and their input. We addressed this helpful recommendation, in essence aligned with points from Reviewer 1, via new experiments (until the money ran out) and addition of data to the manuscript. To recap briefly, we found increased ROS in GC B cells along with higher fractions of annexin V positive cells; intriguingly, increased mtROS (MitoSox signal) was not detected, which contrasts with the results in activated B cells in vitro in a small way. To keep the text focused and not stray too far outside the foundation supported by data, this point may align with papers that provide evidence of differences between pre-GC and GC B cells (for instance with lack of Tfam or LDHA in B cells).    

      It remains unclear whether the impaired proliferation of PexRAP-deficient B cells is primarily due to increased cell death. Although NAC treatment partially rescued the phenotype of reduced PexRAP-deficient B cell number, it did not restore them to control levels. Analysis of the proliferation capacity of PexRAP-deficient B cells following NAC treatment could provide more insight into the cause of impaired proliferation.

      To add to the data permitting an assessment of this issue, we performed new experiments in which B cells were activated (BCR and CD40 cross-linking), cultured, and both the change in population and the CTV partitioning were measured in the presence or absence of NAC. The results, added to the revision as Supplemental Fig 6FH, show that although NAC improved cell numbers for PexRAP-deficient cells relative to controls, this compound did not increase divisions at all. We infer that the more powerful effect of this lipid synthesis enzyme is to promote survival rather than division  capacity. 

      Primary antibody responses were assessed at only one time point (day 20). It would be valuable to examine the kinetics of antibody response at multiple time points (0, 1w, 2w, 3w, for example) to better understand the temporal impact of PexRAP on antibody production.

      We thank the reviewer for this suggestion. While it may be that the kinetic measurement of Ag-specific antibody level across multiple time points would provide an additional mechanistic clue into the of impact PexRAP on antibody production, the end of sponsored funding and imminent lab closure precluded performing such experiments.   

      CD138+ cell population includes both GC-experienced and GC-independent plasma cells (Fig. 7). Enumeration of plasmablasts, which likely consists of both PexRAP-deleted and undeleted cells (Fig. 7D and E), may mislead the readers such that PexRAP is dispensable for plasmablast generation. I would suggest removing these data and instead examining the number of plasmablasts in the experimental setting of Fig. 4A (huCD20-CreERT2-mediated deletion) to address whether PexRAP-deficiency affects plasmablast generation. 

      We have eliminated the figure panels in question, since it is accurate that in the absence of a time-stamping or marking approach we have a limited ability to distinguish plasma cells that arose prior to inactivation of the Dhrs7b gene in B cells. In addition, we performed new experiments that were used to analyze the "early plasmablast" phenotype and added those data to the revision (Supplemental Fig 5D).

    1. eLife Assessment

      The authors quantified intentions and knowledge gaps in scientists' use of sex as a biological variable in their work, and used a workshop intervention to show that while willingness was high, pressure points centered on statistical knowledge and perceived additional monetary costs to research. These important findings demonstrate the difficulty in changing understanding: while interventions can improve knowledge and decrease perceived barriers, the impact was small. The evidence for the findings is solid.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use the theory of planned behavior to understand whether or not intentions to use sex as a biological variable (SABV), as well as attitude (value), subjective norm (social pressure), and behavioral control (ability to conduct behavior), across scientists at a pharmacological conference. They also used an intervention (workshop) to determine the value of this workshop in changing perceptions and misconceptions. Attempts to understand the knowledge gaps were made.

      Strengths:

      The use of SABV is limited in terms of researchers using sex in the analysis as a variable of interest in the models (and not a variable to control). To understand how we can improve on the number of researchers examining the data with sex in the analyses, it is vital we understand the pressure points that researchers consider in their work. The authors identify likely culprits in their analyses. The authors also test an intervention (workshop) to address the main bias or impediments for researchers' use of sex in their analyses.

    3. Reviewer #2 (Public review):

      Summary:

      The investigators tested a workshop intervention to improve knowledge and decrease misconceptions about sex inclusive research.

      Strengths:

      The investigators included control groups and replicated the study in a second population of scientists. The results appear to be well substantiated. Figures are easy to understand.

      Weaknesses: None noted

      Comments on revised version:

      The authors have responded appropriately to all of my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript aims to determine cultural biases and misconceptions in inclusive sex research and evaluate the efficacy of interventions to improve knowledge and shift perceptions to decrease perceived barriers for including both sexes in basic research.

      Overall, this study demonstrates that despite the intention to include both sexes and a general belief in the importance of doing so, relatively few people routinely include both sexes. Further, the perceptions of barriers to doing so are high, including misconceptions surrounding sample size, disaggregation, and variability of females. There was also a substantial number of individuals without the statistical knowledge to appropriately analyze data in studies inclusive of sex. Interventions increased knowledge and decreased perception of barriers.

      Strengths:

      (1) This manuscript provides evidence for the efficacy of interventions for changing attitudes and perceptions of research.

      (2) This manuscript also provides a training manual for expanding this intervention to broader groups of researchers.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors use the theory of planned behavior to understand whether or not intentions to use sex as a biological variable (SABV), as well as attitude (value), subjective norm (social pressure), and behavioral control (ability to conduct behavior), across scientists at a pharmacological conference. They also used an intervention (workshop) to determine the value of this workshop in changing perceptions and misconceptions. Attempts to understand the knowledge gaps were made.

      Strengths:

      The use of SABV is limited in terms of researchers using sex in the analysis as a variable of interest in the models (and not a variable to control). To understand how we can improve on the number of researchers examining the data with sex in the analyses, it is vital we understand the pressure points that researchers consider in their work. The authors identify likely culprits in their analyses. The authors also test an intervention (workshop) to address the main bias or impediments for researchers' use of sex in their analyses. 

      Weaknesses:

      There are a number of assumptions the authors make that could be revisited: 

      (1) that all studies should contain across sex analyses or investigations. It is important to acknowledge that part of the impetus for SABV is to gain more scientific knowledge on females. This will require within sex analyses and dedicated research to uncover how unique characteristics for females can influence physiology and health outcomes. This will only be achieved with the use of female-only studies. The overemphasis on investigations of sex influences limits the work done for women's health, for example, as within-sex analyses are equally important.

      The Sex and Gender Equity in Research (SAGER) guidelines (1) provide guidance that “Where the subjects of research comprise organisms capable of differentiation by sex, the research should be designed and conducted in a way that can reveal sex-related differences in the results, even if these were not initially expected.”.  This is a default position of inclusion where the sex can be determined and analysis assessing for sex related variability in response. This position underpins many of the funding bodies new policies on inclusion.   

      However, we need to place this in the context of the driver of inclusion. The most common reason for including male and female samples is for those studies that are exploring the effect of a treatment and then the goal of inclusion is to assess the generalisability of the treatment effect (exploratory sex inclusion)(2). The second scenario is where sex is included because sex is one of the variables of interest and this situation will arise because there is a hypothesized sex difference of interest (confirmatory sex inclusion).  

      We would argue that the SABV concept was introduced to address the systematic bias of only studying one sex when assessing treatment effect to improve the generalisability of the research.  Therefore, it isn’t directly to gain more scientific knowledge on females.  However, this strategy will highlight when the effect is very different between male and female subjects which will potentially generate sex specific hypotheses.  

      Where research has a hypothesis that is specific to a sex (e.g. it is related to oestrogen levels) it would be appropriate to study only the sex of interest, in this case females. The recently published Sex Inclusive Research Framework gives some guidance here and allows an exemption for such a scenario classifying such proposals “Single sex study justified” (3).

      We have added an additional paragraph to the introduction to clarify the objectives behind inclusion and how this assists the research process. 

      (2) It should be acknowledged that although the variability within each sex is not different on a number of characteristics (as indicated by meta-analyses in rats and mice), this was not done on all variables, and behavioral variables were not included. In addition, across-sex variability may very well be different, which, in turn, would result in statistical sex significance. In addition, on some measures, there are sex differences in variability, as human males have more variability in grey matter volume than females. PMID: 33044802. 

      The manuscript was highlighting the common argument used to exclude the use of females, which is that females are inherently more variable as an absolute truth. We agree there might be situations, where the variance is higher in one sex or another depending on the biology.  We have extended the discussion here to reflect this, and we also linked to the Sex Inclusive Research Framework (3) which highlights that in these situations researchers can utlise this argument provided it is supported with data for the biology of interest. 

      (3) The authors need to acknowledge that it can be important that the sample size is increased when examining more than one sex. If the sample size is too low for biological research, it will not be possible to determine whether or not a difference exists. Using statistical modelling, researchers have found that depending on the effect size, the sample size does need to increase. It is important to bare this in mind as exploratory analyses with small sample size will be extremely limiting and may also discourage further study in this area (or indeed as seen the literature - an exploratory first study with the use of males and females with limited sample size, only to show there is no "significance" and to justify this as an reason to only use males for the further studies in the work. 

      The reviewer raises a common problem: where researchers have frequently argued that if they find no sex differences in a pilot then they can proceed to study only one sex. The SAGER guidelines (1), and now funder guidelines (4, 5), challenge that position. Instead, the expectation is for inclusion as the default in all experiments (exploratory inclusion strategy) to allow generalisable results to be obtained. When the results are very different between the male and female samples, then this can be determined. This perspective shift (2) requires a change in mindset and understanding that the driver behind inclusion is of generalisability not exploration of sex differences. This has been added to the introduction as an additional paragraph exploring the drivers behind inclusion.  

      We agree with the reviewer that if the researcher is interested in sex differences in an effect (confirmatory inclusion strategy, aka sex as a primary variable) then the N will need to be higher.  However, in this situation, one, of course, must have male and female samples in the same experiment to allow the simultaneous exploration to assess the dependency on sex. 

      Reviewer #2 (Public review): 

      Summary:

      The investigators tested a workshop intervention to improve knowledge and decrease misconceptions about sex inclusive research. There were important findings that demonstrate the difficulty in changing opinions and knowledge about the importance of studying both males and females. While interventions can improve knowledge and decrease perceived barriers, the impact was small. 

      Strengths:

      The investigators included control groups and replicated the study in a second population of scientists. The results appear to be well substantiated. These are valuable findings that have practical implications for fields where sex is included as a biological variable to improve rigor and reproducibility. 

      Thank you for assessment and highlighting these strengths.  We appreciate your recognition of the value and practical implications of this work. 

      Weaknesses:

      I found the figures difficult to understand and would have appreciated more explanation of what is depicted, as well as greater space between the bars representing different categories. 

      We have improved the figures and figure legends to improve clarity. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to determine cultural biases and misconceptions in inclusive sex research and evaluate the efficacy of interventions to improve knowledge and shift perceptions to decrease perceived barriers for including both sexes in basic research. 

      Overall, this study demonstrates that despite the intention to include both sexes and a general belief in the importance of doing so, relatively few people routinely include both sexes. Further, the perceptions of barriers to doing so are high, including misconceptions surrounding sample size, disaggregation, and variability of females. There was also a substantial number of individuals without the statistical knowledge to appropriately analyze data in studies inclusive of sex. Interventions increased knowledge and decreased perception of barriers. 

      Strengths:

      (1) This manuscript provides evidence for the efficacy of interventions for changing attitudes and perceptions of research.

      (2) This manuscript also provides a training manual for expanding this intervention to broader groups of researchers.

      Thank you for highlighting these strengths. We appreciate your recognition that the intervention was effect in changing attitudes and perception. We deliberately chose to share the material to provide the resources to allow a wider engagement.  

      Weaknesses:

      The major weakness here is that the post-workshop assessment is a single time point, soon after the intervention. As this paper shows, intention for these individuals is already high, so does decreasing perception of barriers and increasing knowledge change behavior, and increase the number of studies that include both sexes? Similarly, does the intervention start to shift cultural factors? Do these contribute to a change in behavior? 

      Measuring change in behaviour following an intervention is challenging and hence we had implemented an intention score as a proxy for behaviour. We appreciate the benefit of a long-term analysis, but it was beyond the scope of this study and would need a larger dataset size to allow for attrition. We agree that the strategy implemented has weaknesses. We have extended the limitation section in the discussion to include these. 

      Reviewer #1 (Recommendations for the authors):  

      I would ask them to think about alternative explanations and ask for free-form responses, and to revise with the caveats written above - sample size does need to be increased depending on effect size, and that within sex studies are also important. Not all studies should focus on sex influences.  

      The inclusion of the additional paragraph in the introduction to clarify the objective of inclusion and the resulting impact on experimental design should address these recommendations.   

      We have also added the free-form responses as an additional supplementary file.  

      Reviewer #2 (Recommendations for the authors):  

      This is an important set of studies. My only recommendation to improve the data presentation so that it is clear what is depicted and how the analyses were conducted. I know it is in the methods, but reminding the reader would be helpful.  

      We have revisited the figures and included more information in the legends to explain the analysis and improve clarity.   

      Reviewer #3 (Recommendations for the authors):  

      There are parts in the introduction which read as contradictory and as such are confusing - for example, in the 3rd paragraph it states that little progress on sex inclusive research has been made, and in the following sentences it states that the proportion of published studies across sex has improved. The references in these two statements are from the same time range, so has this improved? Or not?  

      The introduction does include a summation statement on the position: “Whilst a positive step forward, this proportion still represents a minority of studies, and notably this inclusion was not associated with an increase in the proportion of studies that included data analysed by sex.” We have reworded the text to ensure it is internally consistent with this summary statement and this should increase clarity.

      In discussing the results, it is sometimes confusing what the percentages mean. For example, "the researchers reported only conducting sex inclusive research in <=55% of their studies over the past 5 years (55% in study 1 general population and 35% study 2 pre-assessment)." Does that mean 55% of people are conducting sex inclusive research, or does this mean only half of their studies? These two options have very different implications.

      We agree that the sentence is confusing and it has been reworded.  

      Addressing long-term assessments in attitude and action (ie, performing sex inclusive research) is a crucial addition, with data if possible, but at least substantive discussion.  

      We have add this to the limitation section in the discussion

      One minor but confusing point is the analogy comparing sex inclusive studies with attending the gym. The point is well taken - knowledge is not enough for behavior change. However, the argument here is that to increase sex inclusive research requires cultural change. To go to the gym, requires motivation.This seems like an oranges-to-lemons comparison (same family, different outcome when you bite into it).

      At the core, both scenarios involve the challenge of changing established habits and cultural norms in action based on knowledge (the right thing to do). The exercise scenario is a primary example provided by the original authors to describe how aspects of the theory of planned behaviour (perceived behavioural control, attitude, and social norms) may influence behavioural change. Understanding which of these aspects may drive or influence change is why we used this framework to understand our study population.  We disagree that is an oranges-to-lemons comparison.

      References

      (1) Heidari S, Babor TF, De Castro P, Tort S, Curno M. Sex and Gender Equity in Research: rationale for the SAGER guidelines and recommended use. Res Integr Peer Rev. 2016;1:2.

      (2) Karp NA. Navigating the paradigm shift of sex inclusive preclinical research and lessons learnt. Commun Biol. 2025;8(1):681.

      (3) Karp NA, Berdoy M, Gray K, Hunt L, Jennings M, Kerton A, et al. The Sex Inclusive Research Framework to address sex bias in preclinical research proposals. Nat Commun. 2025;16(1):3763.

      (4) MRC. Sex in experimental design - Guidance on new requirements https://www.ukri.org/councils/mrc/guidance-for-applicants/policies-and-guidance-forresearchers/sex-in-experimental-design/: UK Research and Innovation; 2022 [

      (5) Clayton JA, Collins FS. Policy: NIH to balance sex in cell and animal studies. Nature. 2014;509(7500):282-3.

    1. eLife Assessment

      This valuable study reports a critical role of the axonemal protein ANKRD5 in sperm motility and male fertility. Convincing data were presented to support the main conclusion. This work will be of interest to biomedical researchers who study ciliogenesis, sperm biology, and male fertility.

    2. Reviewer #1 (Public review):

      Summary:

      Asthenospermia, characterized by reduced sperm motility, is one of the major causes of male infertility. The "9 + 2" arranged MTs and over 200 associated proteins constitute the axoneme, the molecular machine for flagellar and ciliary motility. Understanding the physiological functions of axonemal proteins, particularly their links to male infertility, could help uncover the genetic causes of asthenospermia and improve its clinical diagnosis and management. In this study, the authors generated Ankrd5 null mice and found that ANKRD5-/- males exhibited reduced sperm motility and infertility. Using FLAG-tagged ANKRD5 mice, mass spectrometry, and immunoprecipitation (IP) analyses, they confirmed that ANKRD5 is localized within the N-DRC, a critical protein complex for normal flagellar motility. However, transmission electron microscopy (TEM) and cryo-electron tomography (cryo-ET) of sperm from Ankrd5 null mice did not reveal significant structural abnormalities.

      Strengths:

      The phenotypes observed in ANKRD5-/- mice, including reduced sperm motility and male infertility, are conversing. The authors demonstrated that ANKRD5 is an N-DRC protein that interacts with TCTE1 and DRC4. Most of the experiments are well-designed and executed.

      Comments on revised version:

      My concerns have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates the role of ANKRD5 (ANKEF1) as a component of the N-DRC complex in sperm motility and male fertility. Using Ankrd5 knockout mice, the study demonstrates that ANKRD5 is essential for sperm motility and identifies its interaction with N-DRC components through IP-mass spectrometry and cryo-ET. The results provide insights into ANKRD5's function, highlighting its potential involvement in axoneme stability and sperm energy metabolism.

      Strengths:

      The authors employ a wide range of techniques, including gene knockout models, proteomics, cryo-ET, and immunoprecipitation, to explore ANKRD5's role in sperm biology.

      Comments on revised version:

      The authors have already addressed the issues I am concerned about.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      Asthenospermia, characterized by reduced sperm motility, is one of the major causes of male infertility. The "9 + 2" arranged MTs and over 200 associated proteins constitute the axoneme, the molecular machine for flagellar and ciliary motility. Understanding the physiological functions of axonemal proteins, particularly their links to male infertility, could help uncover the genetic causes of asthenospermia and improve its clinical diagnosis and management. In this study, the authors generated Ankrd5 null mice and found that ANKRD5-/- males exhibited reduced sperm motility and infertility. Using FLAG-tagged ANKRD5 mice, mass spectrometry, and immunoprecipitation (IP) analyses, they confirmed that ANKRD5 is localized within the N-DRC, a critical protein complex for normal flagellar motility. However, transmission electron microscopy (TEM) and cryo-electron tomography (cryo-ET) of sperm from Ankrd5 null mice did not reveal significant structural abnormalities.

      Strengths:

      The phenotypes observed in ANKRD5-/- mice, including reduced sperm motility and male infertility, are conversing. The authors demonstrated that ANKRD5 is an N-DRC protein that interacts with TCTE1 and DRC4. Most of the experiments are well designed and executed.

      Weaknesses:

      The last section of cryo-ET analysis is not convincing. "ANKRD5 depletion may impair buffering effect between adjacent DMTs in the axoneme".

      "In WT sperm, DMTs typically appeared circular, whereas ANKRD5-KO DMTs seemed to be extruded as polygonal. (Fig. S9B,D). ANKRD5-KO DMTs seemed partially open at the junction between the A- and B-tubes (Fig. S9B,D)." In the TEM images of 4E, ANKRD5-KO DMTs look the same as WT. The distortion could result from suboptimal sample preparation, imaging or data processing. Thus, the subsequent analyses and conclusions are not reliable.

      Thank you for your valuable advice. To validate the results of cryo-ET, we carefully analyzed the TEM results (previously we only focused on the global "9+2" structure of the axial filament) and found that deletion of ANKRD5 resulted in both normal and deformed DMT morphologies, which was consistent with the results observed by cryo-ET. At the same time, we have added the corresponding text and picture descriptions in the article:

      The text description we added is: “Upon re-examining the TEM data in light of the Cryo-ET findings, similar abnormalities were observed in the TEM images (Fig.4E, Fig. S10B). Notably, both intact and deformed DMT structures were consistently observed in both TEM and STA analyses, with the deformation of the B-tube being more obvious (Fig.4E, Fig. S10). ”

      This paper still requires significant improvements in writing and language refinement. Here is an example: "While N-DRC is critical for sperm motility, but the existence of additional regulators that coordinate its function remains unclear" - ill-formed sentences.

      We appreciate the reviewer’s valuable comment regarding the clarity of our writing. The sentence cited (“While N-DRC is critical for sperm motility, but the existence of additional regulators that coordinate its function remains unclear”) was indeed ill-formed. We have revised it to improve readability and precision. The corrected version now reads:“Although the N-DRC is critical for sperm motility, whether additional regulatory components coordinate its function remains unclear.” We have carefully re-examined the manuscript and refined the language throughout to ensure clarity and conciseness.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates the role of ANKRD5 (ANKEF1) as a component of the N-DRC complex in sperm motility and male fertility. Using Ankrd5 knockout mice, the study demonstrates that ANKRD5 is essential for sperm motility and identifies its interaction with N-DRC components through IP-mass spectrometry and cryo-ET. The results provide insights into ANKRD5's function, highlighting its potential involvement in axoneme stability and sperm energy metabolism.

      Strengths:

      The authors employ a wide range of techniques, including gene knockout models, proteomics, cryo-ET, and immunoprecipitation, to explore ANKRD5's role in sperm biology.

      Weaknesses:

      “Limited Citations in Introduction: Key references on the role of N-DRC components (e.g.,DRC2, DRC4) in male infertility are missing, which weakens the contextual background.”

      We appreciate the reviewer’s valuable suggestion. To address this concern, we have added the following sentence in the Introduction:

      “Recent mammalian knockout studies further confirmed that loss of DRC2 or DRC4 results in severe sperm flagellar assembly defects, multiple morphological abnormalities of the sperm flagella (MMAF), and complete male infertility, highlighting their indispensable roles in spermatogenesis and reproduction [31].”

      This addition introduces up-to-date evidence on DRC2 and DRC4 functions in male infertility and strengthens the contextual background as recommended.

      Reviewer #1 (Recommendations for the authors):

      "Male infertility impacts 8%-12% of the global male population, with sperm motility defects contributing to 40%-50% of these cases [2,3]. " Is reference 3 proper? I don't see "sperm motility defects contributing to 40%-50%" of male infertility.

      Thank you for identifying this issue. You are correct—reference 3 does not support the statement about sperm motility defects comprising 40–50% of male infertility cases; it actually states:

      “Male factor infertility is when an issue with the man’s biology makes him unable to impregnate a woman. It accounts for between 40 to 50 percent of infertility cases and affects around 7 percent of men.”

      This was a misunderstanding on my part, and I apologize for the oversight.

      To correct this, we have replaced the statement with more accurate references:

      PMID: 33968937 confirms:

      “Asthenozoospermia accounts for over 80% of primary male infertility cases.”

      PMID: 33191078 defines asthenozoospermia (AZS) as reduced or absent sperm motility and notes it as a major cause of male infertility.

      We have updated the manuscript accordingly:

      In the Significance Statement: “Male infertility affects approximately 8%-12% of men globally, with defects in sperm motility accounting for over 80% of these cases.”

      In the Introduction: “Male infertility affects approximately 8% to 12% of the global male population, with defects in sperm motility accounting for over 80% of these cases[2,3].”

      Thank you again for your careful review and for giving us the opportunity to improve the accuracy of our manuscript.

      "Rather than bypassing the issue with ICSI, infertility from poor sperm motility could potentially be treated or even cured through stimulation of specific signaling pathways or gene therapy." Need references.

      We appreciate the reviewer’s insightful comment. In response, we have added three supporting references to the relevant sentence.

      The first reference (PMID: 39932044) demonstrates that cBiMPs and the PDE-10A inhibitor TAK-063 significantly and sustainably improve motility in human sperm with low activity, including cryopreserved samples, without inducing premature acrosome reaction or DNA damage. The second reference (PMID: 29581387) shows that activation of the PKA/PI3K/Ca²⁺ signaling pathways can reverse reduced sperm motility. The third reference (PMID: 33533741) reports that CRISPR-Cas9-mediated correction of a point mutation in Tex11<sup>PM/Y</sup> spermatogonial stem cells (SSCs) restores spermatogenesis in mice and results in the production of fertile offspring.

      These references provide mechanistic support and demonstrate the feasibility of treating poor sperm motility through targeted pathway modulation or gene therapy, thus reinforcing the validity of our statement.

      "Our findings indicate that ANKRD5 (Ankyrin repeat domain 5; also known as ANK5 or ANKEF1) interacts with N-DRC structure". The full name should be provided the first time ANKRD5 appears. Is ANKRD5 a component of N-DRC or does it interact with N-DRC?

      We thank the reviewer for the valuable suggestion. In response, we have moved the full name “Ankyrin repeat domain 5; also known as ANK5 or ANKEF1” to the abstract where ANKRD5 first appears, and have removed the redundant mention from the main text.

      Based on our experimental data, we consider ANKRD5 to be a novel component of the N-DRC (nexin-dynein regulatory complex), rather than merely an interacting partner. Therefore, we have revised the sentence in the main text to read:

      “Here, we demonstrate that ANKRD5 is a novel N-DRC component essential for maintaining sperm motility.”

      Fig 5E, numbers of TEM images should be added.

      We thank the reviewer for the suggestion. We would like to clarify that Fig. 5E does not contain TEM images, and it is likely that the reviewer was referring to Fig. 4E instead.

      In Fig. 4E, we conducted three independent experiments. In each experiment, 60 TEM cross-sectional images of sperm tails were analyzed for both Ankrd5 knockout and control mice.

      The findings were consistent across all replicates.

      We have updated the figure legend accordingly, which now reads:

      “Transmission electron microscopy (TEM) of sperm tails from control and Ankrd5 KO mice. Cross-sections of the midpiece, principal piece, and end piece were examined. Red dashed boxes highlight regions of interest, and the magnified views of these boxed areas are shown in the upper right corner of each image. In three independent experiments, 20 sperm cross-sections per mouse were analyzed for each group, with consistent results observed.”

      There are random "222" in the references. Please check and correct.

      I sincerely apologize for the errors caused by the reference management software, which resulted in the insertion of random "222" and similar numbering issues in the reference list. I have carefully reviewed and corrected the following problems:

      References 9, 11, 13, 26, 34, 63, and 64 had the number "222" mistakenly placed before the title; these have now been removed. References 15 and 18 had "111" incorrectly inserted before the title; this has also been corrected. Reference 36 had an erroneous "2" before the title and was found to be a duplicate of Reference 32; these have now been merged into a single citation. Additionally, References 22 and 26 were identified as duplicates of the same article and have been consolidated accordingly. 

      All these issues have been resolved to ensure the reference list is accurate and properly formatted.

      Reviewer #2 (Recommendations for the authors):

      The authors have already addressed most of the issues I am concerned about.

      In addition, we have also corrected some errors in the revised manuscript:

      (1) In Figure 3G, the y-axis label was previously marked as “Sperm count in the oviduct (10⁶)”, which has now been corrected to “Sperm count in the oviduct”.

      (2) All p-values have been reformatted to italic lowercase letters to comply with the journal style guidelines.

      Figure 6 Legend: A typographical error in the figure legend has been corrected. The text previously read “(A) The differentially expressed proteins of Ankrd5<sup>+/–</sup> and Ankrd5<sup>+/-</sup> were identified...”. This has now been amended to “(A) The differentially expressed proteins of Ankrd5<sup>+/–</sup> and Ankrd5<sup>+/–</sup> were identified...” to correctly represent the comparison between heterozygous and homozygous knockout groups.

      In the original Figure 4E, we added a zoom-in panel to the image to show the deformed DMT.

    1. eLife Assessment

      This manuscript revisits the well-studied KdpFABC potassium transport system from bacteria with a convincing set of new higher resolution structures, a protein expression strategy that permits purification of the active wildtype protein, and insight obtained from mutagenesis and activity assays. The thorough and thoughtful mechanistic analyses make this a valuable contribution to the membrane transport field.

    2. Reviewer #3 (Public review):

      Summary:

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wildtype protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.

      Strengths:

      The high resolution (2.1 Å) of the current structure is impressive, and allows many new densities in the potassium transport pathway to be resolved. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. The SSME experiments are rigorous.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm. 

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the ratelimiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying welldefined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology. 

      Reviewer #3 (Public review): 

      Summary: 

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wildtype protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway. 

      Strengths: 

      The high resolution (2.1 Å) of the current structure is impressive, and allows many new densities in the potassium transport pathway to be resolved. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. The SSME experiments are generally rigorous. 

      Weaknesses: 

      The present SSME experiments do not support quantitative comparisons of different mutants, as in Figures 4D and 5E. Only qualitative inferences can be drawn among different mutant constructs. 

      Thank you to both reviewers for your thorough review of our work. We acknowledge the limitations of SSME experiments in quantitative comparison of mutants and have revised the manuscript to address this point. In addition, we have included new ATPase data from reconstituted vesicles which we believe will help to strengthen our contention that both ATPase and transport are equally affected by Val496 mutations.

      Reviewer #2 (Recommendations for the authors): 

      I have a minor editorial comment: 

      Perhaps I am confused. However, in reference to the text in the Results: "Our WT complex displayed high levels of K+-dependent ATPase activity and generated robust transport currents (Fig. 1 - figure suppl. 1).", I do not see either K+-dependency of ATPase activity nor transport currents in Fig. 1 - figure suppl. 1. Perhaps the text needs to be edited for clarity. 

      Thank you for pointing this out. This confusion was caused by our removal of a panel from the revised manuscript, which depicted K+-dependent transport currents. Although this panel is somewhat redundant, given inclusion of raw SSME traces from all the mutants, it has been replaced as Fig. 1 - figure supplement 1F, thus providing a thorough characterization of the preparation used for cryo-EM analysis and supporting the statement quoted by this reviewer.

      Reviewer #3 (Recommendations for the authors): 

      The authors have provided a detailed description of the SSME data collection, and followed rigorous protocols to ensure that the currents measured on a particular sensor remained stable over time. 

      I still have reservations about the direct comparison of transport in the different mutants. Specifically, on page 6, the authors state that "The longer side chain of V496M reduces transport modestly with no effect on ATPase activity. V496R, which introduces positive charge, completely abolishes activity. V496W and V496H reduce both transport and ATPase activity by about half, perhaps due to steric hindrance for the former and partial protonation for the latter." And in figures 4D and 5B, by plotting all of the peak currents on the same graph, the authors are giving the data a quantitative veneer, when these different experiments really aren't directly comparable, especially in the absence of any controls for reconstitution efficiency. 

      In terms of overall conclusions, for the more drastic mutant phenotypes, I think it is completely reasonable to conclude that transport is not observed. But a 2-fold difference could easily result from differences in reconstitution or sensor preparation. My suggestion would be to show example traces rather than a numeric plot in 4D/5E, to convey the qualitative nature of the mutant-to-mutant comparisons, and to re-write the text to acknowledge the shortcomings of mutant-to-mutant comparisons with SSME, and avoid commenting on the more subtle phenotypes, such as modest decreases and reductions by about half. 

      Figure 4, supplement 1. What is S162D? I don't think it is mentioned in the main text. 

      We agree with the reviewer's point that quantitative comparison of different mutants by SSME is compromised by ambiguity in reconstitution. However, we do not think that display of raw SSME currents is an effective way to communicate qualitative effects to the general reader, given the complexity of these data (e.g., distinction between transient binding current seen in V496R and genuine, steady-state transport current seen in WT). So we have taken a compromise approach. To start, we have removed the transport data from the main figure (Fig. 4). Luckily, we had frozen and saved the batch of reconstituted proteoliposomes from Val496 mutants that had been used for transport assays. We therefore measured ATPase activities from these proteoliposomes - after adding a small amount of detergent to prevent buildup of electrochemical gradients (1 mg/ml decylmaltoside which is only slightly more than the critical micelle concentration of 0.87 mg/ml). Differences in ATPase activity from these proteoliposomes were very similar to those measured prior to reconstitution (i.e., data in Fig. 4d) indicating that reconstitution efficiencies were comparable for the various mutants. Furthermore, differences in SSME currents are very similar to these ATPase activities, suggesting that Val496 mutants did not affect energy coupling. These data are shown in the revised Fig. 4 - figure suppl. 1a, along with the SSME raw data and size-exclusion chromatography elution profiles (Fig. 4 - figure suppl. 1b-g). We also altered the text to point out the concern over comparing transport data from different mutants (see below). We hope that this revised presentation adequately supports the conclusion that Val496 mutations - and especially the V496R substitution - influence the passage of K+ through the tunnel without affecting mechanics of the ATP-dependent pump. 

      The paragraph in question now reads as follows (pg. 6-7, with additional changes to legends to Fig. 4 and Fig. 4 - figure suppl. 1):

      "In order to provide experimental evidence for K+ transport through the tunnel, we made a series of substitutions to Val496 in KdpA. This residue resides near the widest part of the tunnel and is fully exposed to its interior (Fig. 4a). We made substitutions to increase its bulk (V496M and V496W) and to introduce charge (V496E, V496R and V496H). We used the AlphaFold-3 artificial intelligence structure prediction program (Jumper et al., 2021) to generate structures of these mutants and to evaluate their potential impact on tunnel dimensions. This analysis predicts that V496W and V496R reduce the radius to well below the 1.4 Å threshold required for passage of K+ or water (Fig. 4c); V496E and V496M also constrict the tunnel, but to a lesser extent. Measurements of ATPase and transport activity (Fig. 4d) show that negative charge (V496E) has no effect. The or a longer side chain of (V496M) reduces transport modestly with have no apparent effect on ATPase activity. V496R, which introduces positive charge, almost completely abolishes activity. V496W and V496H reduce both transport and ATPase activity by about half, perhaps due to steric hindrance for the former and partial protonation for the latter. Transport activity of these mutants was also measured, but quantitative comparisons are hampered by potential inconsistency in reconstitution of proteoliposomes and in preparation of sensors for SSME. To account for differences in reconstitution, we compared ATPase activity and transport currents taken from the same batch of vesicles (Fig. 4 - figure suppl. 1a).  These data show that differences in ATPase activity of proteoliposomes was consistent with differences measured prior to reconstitution (Fig. 4d). Transport activity, which was derived from multiple sensors, mirrored ATPase activity, indicating that the Val496 mutants did not affect energy coupling, but simply modulated turnover rate of the pump."

      S162D was included as a negative control, together with D307A. However, given the inactive mutants discussed in Fig. 5 (Asp582 and Lys586 substitutions), these seem an unnecessary distraction and have been removed from Fig. 4 - figure suppl. 1.

    1. eLife Assessment

      In flies defective for axonal transport of mitochondria, the authors report the upregulation of one subunit, the beta subunit, of the heterotrimeric eIF2 complex via mass spectroscopy proteomics. Neuronal overexpression of eIF2β phenocopied aspects of neuronal dysfunction observed when axonal transport of mitochondria was compromised. Conversely, lowering eIF2β expression suppressed aspects of neuronal dysfunction. While these are intriguing and useful observations, technical weaknesses limit the interpretation. On balance, the evidence supporting the current claims is suggestive but incomplete, especially concerning the characterization of the eIF2 heterotrimer and the data regarding translational regulation.

    2. Reviewer #1 (Public review):

      The study presents significant findings on the role of mitochondrial depletion in axons and its impact on neuronal proteostasis. It effectively demonstrates how the loss of axonal mitochondria and elevated levels of eIF2β contribute to autophagy collapse and neuronal dysfunction. The use of Drosophila as a model organism and comprehensive proteome analysis adds robustness to the findings.

      In this revision, the authors have responded thoughtfully to previous concerns. In particular, they have addressed the need for a quantitative analysis of age-dependent changes in eIF2β and eIF2α. By adding western blot data from multiple time points (7 to 63 days), they show that eIF2β levels gradually increase until middle age, then decline. In milton knockdown flies, this pattern appears shifted, supporting the idea that mitochondrial defects may accelerate aging-related molecular changes. These additions clarify the temporal dynamics of eIF2β and improve the overall interpretation.

      Other updates include appropriate corrections to figures and quantification methods. The authors have also revised some of their earlier mechanistic claims, presenting a more cautious interpretation of their findings.

      Overall, this work provides new insights into how mitochondrial transport defects may influence aging-related proteostasis through eIF2β. The manuscript is now more convincing, and the revisions address the main points raised earlier. I find the updated version much improved.

    3. Reviewer #2 (Public review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria, which they suggest accelerates age-dependent changes rather than increasing their magnitude.

      Strong caution is necessary regarding the interpretation of translational regulation resulting from the milton KD. The effect of milton KD on translation appears subtle, if present at all, in the puromycin incorporation experiments in both the initial and revised versions. Additionally, the polysome profiling data in the revised manuscript lack the clear resolution for ribosomal subunits, monosomes, and polysomes that is typically expected in publications.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The study presents significant findings on the role of mitochondrial depletion in axons and its impact on neuronal proteostasis. It effectively demonstrates how the loss of axonal mitochondria and elevated levels of eIF2β contribute to autophagy collapse and neuronal dysfunction. The use of Drosophila as a model organism and comprehensive proteome analysis adds robustness to the findings.

      In this revision, the authors have responded thoughtfully to previous concerns. In particular, they have addressed the need for a quantitative analysis of age-dependent changes in eIF2β and eIF2α. By adding western blot data from multiple time points (7 to 63 days), they show that eIF2β levels gradually increase until middle age, then decline. In milton knockdown flies, this pattern appears shifted, supporting the idea that mitochondrial defects may accelerate aging-related molecular changes. These additions clarify the temporal dynamics of eIF2β and improve the overall interpretation.

      Other updates include appropriate corrections to figures and quantification methods. The authors have also revised some of their earlier mechanistic claims, presenting a more cautious interpretation of their findings.

      Overall, this work provides new insights into how mitochondrial transport defects may influence aging-related proteostasis through eIF2β. The manuscript is now more convincing, and the revisions address the main points raised earlier. I find the updated version much improved.

      Thank you so much for the review, insightful comments and encouragement. We appreciate it.  

      Reviewer #2 (Public review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria, which they suggest accelerates age-dependent changes rather than increasing their magnitude.

      Strong caution is necessary regarding the interpretation of translational regulation resulting from the milton KD. The effect of milton KD on translation appears subtle, if present at all, in the puromycin incorporation experiments in both the initial and revised versions. Additionally, the polysome profiling data in the revised manuscript lack the clear resolution for ribosomal subunits, monosomes, and polysomes that is typically expected in publications.

      Thank you so much for the review and insightful comments. We appreciate it.  

      Reviewer #2 (Recommendations for the authors):

      The revised manuscript demonstrates many improvements. The authors have provided a more comprehensive data set and a more detailed description of their results. Furthermore, their explanation of the Integrated Stress Response (ISR) has been corrected, and this correction is reflected in the data interpretation.

      As in the public review, I maintained my emphasis on the weakness of the claim on suppressed global translation, since the data are the same in the initial and the revised versions.

      Thank you for your review. We understand that further studies will be needed to elucidate the roles on mitochondrial distribution in global translation profile. We will keep working on it. 

      A few suggestions for minor corrections.

      (1) The order of figures in the revised version is disorganized.

      Thank you for pointing it out. We corrected the order. 

      (2) In Figure 1A, mitochondria is bound by milton, and kinesin is bound by Miro. Their roles should be opposite.

      Thank you for pointing it out, and we are sorry for the oversight. We corrected it.

    1. eLife Assessment

      Xenacoelomorpha is an enigmatic phylum, displaying various presumably simple or ancestral bilaterian features. This valuable study characterises the reproductive life history of Hofstenia miamia, a member of class Acoela in this phylum. The authors describe the morphology and development of the reproductive system, its changes upon degrowth and regeneration, and the animals' egg-laying behaviour. The evidence is convincing, with fluorescent microscopy and quantitative measurements as a considerable improvement to historical reports based mostly on histology and qualitative observations.

    2. Reviewer #1 (Public review):

      The aim of this study was a better understanding of the reproductive life history of acoels. The acoel Hofstenia miamia, an emerging model organism, is investigated; the authors nevertheless acknowledge and address the high variability in reproductive morphology and strategies within Acoela.

      The morphology of male and female reproductive organs in these hermaphroditic worms is characterised through stereo microscopy, immunohistochemistry, histology, and fluorescent in situ hybridization. The findings confirm and better detail historical descriptions. A novelty in the field is the in situ hybridization experiments, which link already published single-cell sequencing data to the worms' morphology. An interesting finding, though not further discussed by the authors, is that the known germline markers cgnl1-2 and Piwi-1 are only localized in the ovaries and not in the testes.

      The work also clarifies the timing and order of appearance of reproductive organs during development and regeneration, as well as the changes upon de-growth. It shows an association of reproductive organ growth to whole body size, which will be surely taken into account and further explored in future acoel studies. This is also the first instance of non-anecdotal degrowth upon starvation in H. miamia (and to my knowledge in acoels, except recorded weight upon starvation in Convolutriloba retrogemma [1]).

      Egg laying through the mouth is described in H. miamia for the first time as well as the worms' behavior in egg laying, i.e. choosing the tanks' walls rather than its floor, laying eggs in clutches, and delaying egg-laying during food deprivation. Self-fertilization is also reported for the first time.

      The main strength of this study is that it expands previous knowledge on the reproductive life history traits in H. miamia and it lays the foundation for future studies on how these traits are affected by various factors, as well as for comparative studies within acoels. As highlighted above, many phenomena are addressed in a rigorous and/or quantitative way for the first time. This can be considered the start of a novel approach to reproductive studies in acoels, as the authors suggest in the conclusion. It can be also interpreted as a testimony of how an established model system can benefit the study of an understudied animal group.

      The main weakness of the work is the lack of convincing explanations on the dynamics of self-fertilization, sperm storage, and movement of oocytes from the ovaries to the central cavity and subsequently to the pharynx. These questions are also raised by the authors themselves in the discussion. Another weakness (or rather missing potential strength) is the limited focus on genes. Given the presence of the single-cell sequencing atlas and established methods for in situ hybridization and even transgenesis in H. miamia, this model provides a unique opportunity to investigate germline genes in acoels and their role in development, regeneration, and degrowth. It should also be noted that employing Transmission Electron Microscopy would have enabled a more detailed comparison with other acoels, since ultrastructural studies of reproductive organs have been published for other species (cfr e.g. [2],[3],[4]). This is especially true for a better understanding of the relation between sperm axoneme and flagellum (mentioned in the Results section), as well as of sexual conflict (mentioned in the Discussion).

      (1) Shannon, Thomas. 2007. 'Photosmoregulation: Evidence of Host Behavioral Photoregulation of an Algal Endosymbiont by the Acoel Convolutriloba Retrogemma as a Means of Non-Metabolic Osmoregulation'. Athens, Georgia: University of Georgia [Dissertation].

      (2) Zabotin, Ya. I., and A. I. Golubev. 2014. 'Ultrastructure of Oocytes and Female Copulatory Organs of Acoela'. Biology Bulletin 41 (9): 722-35.

      (3) Achatz, Johannes Georg, Matthew Hooge, Andreas Wallberg, Ulf Jondelius, and Seth Tyler. 2010. 'Systematic Revision of Acoels with 9+0 Sperm Ultrastructure (Convolutida) and the Influence of Sexual Conflict on Morphology'.

      (4) Petrov, Anatoly, Matthew Hooge, and Seth Tyler. 2006. 'Comparative Morphology of the Bursal Nozzles in Acoels (Acoela, Acoelomorpha)'. Journal of Morphology 267 (5): 634-48.

    3. Reviewer #2 (Public review):

      Summary:

      While the phylogenetic position of Acoels (and Xenacoelomorpha) remains still debated, investigations of various representative species are critical to understanding their overall biology.

      Hofstenia is an Acoels species that can be maintained in laboratory conditions and for which several critical techniques are available. The current manuscript provides a comprehensive and widely descriptive investigation of the productive system of Hofstenia miamia.

      Strengths:

      (1) Xenacoelomorpha is a wide group of animals comprising three major clades and several hundred species, yet they are widely understudied. A comprehensive state-of-the-art analysis on the reprodutive system of Hofstenia as representative is thus highly relevant.

      (2) The investigations are overall very thorough, well documented, and nicely visualised in an array of figures. In some way, I particularly enjoyed seeing data displayed in a visually appealing quantitative or semi-quantitative fashion.

      (3) The data provided is diverse and rich. For instance, the behavioral investigations open up new avenues for further in-depth projects.

      Weaknesses:

      While the analyses are extensive, they appear in some way a little uni-dimensional. For instance the two markers used were characterized in a recent scRNAseq data-set of the Srivastava lab. One might have expected slightly deeper molecular analyses. Along the same line, particularly the modes of spermatogenesis or oogenesis have not been further analysed, nor the proposed mode of sperm-storage.

      [Editors' note: In their response, the authors have suitably addressed these concerns or have satisfactorily explained the challenges in addressing them.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors): 

      I will address here just some minor changes that would improve understanding, reproducibility, or cohesion with the literature.

      (1) It would be good to mention that the prostatic vesicle of this study is named vesicula granulorum in (Steniböck, 1966) and granule vesicle in (Hooge et al, 2007).

      We have now included this (line 90 of our revised manuscript).  

      (2) A slightly more detailed discussion of the germline genes would be interesting. For example, a potential function of pa1b3-2 and cgnl1-2 based on the similarity to known genes or on the conserved domains.

      Pa1b3-2 appears to encode an acetylhydrolase; cgnl1-2 is likely a cingulin family protein involved in cell junctions. However, given the evolutionary distance between acoels and model organisms in whom these genes have been studied, we believe it is premature to speculate on their function without substantial additional work. We believe this work would be more appropriate in a future publication focused on the molecular genetic underpinnings of Hofstenia’s reproductive systems and their development.  

      (3) It is mentioned that the animals can store sperm while lacking a seminal bursa "given that H. miamia can lay eggs for months after a single mating" (line 635) - this could also be self-fertilization, according to the authors' other findings.

      We agree that it is possible this is self-fertilization, and we believe we have represented this uncertainty accurately in the text. However, we do not think this is likely, because self-fertilization manifests as a single burst of egg laying (Fig. 6D). We discuss this in the Results (line 540). 

      (4) A source should be given for the tree in Figure 7B. 

      We have now included this source (line 736), and we apologize for the oversight.  

      (5) Either in the Methods or in the Results section, it would be good to give more details on why actin and FMRFamide and tropomyosin are chosen for the immunohistochemistry studies.

      We have now included more detail in the Methods (line 823). Briefly, these are previously-validated antibodies that we knew would label relevant morphology.

      (6) In the Methods "a standard protocol hematoxylin eosin" is mentioned. Even if this is a fairly common technique, more details or a reference should be provided.

      We have now included more detail, and a reference (lines 766-774).  

      (7) Given the historical placement of Acoela within Platyhelminthes and the fact that the readers might not be very familiar with this group of animals, two passages can be confusing: line 499 and lines 674-678.

      We have edited these sentences to clarify when we mean platyhelminthes, which addresses this confusion.  

      (8) A small addition to Table S1: Amphiscolops langerhansi also presents asexual reproduction through fission ([1], cited in [2]]).

      Thanks. We have included this in Table S1.

      (a) Hanson, E. D. 1960. 'Asexual Reproduction in Acoelous Turbellaria'. The Yale Journal of Biology and Medicine 33 (2): 107-11.

      (b) Hendelberg, Jan, and Bertil Åkesson. 1991. 'Studies of the Budding Process in Convolutriloba Retrogemma (Acoela, Platyhelminthes)'. In Turbellarian Biology: Proceedings of the Sixth International Symposium on the Biology of the Turbellaria, Held at Hirosaki, Japan, 7-12 August 1990, 11-17. Springer. 

      Reviewer #2 (Recommendations for the authors): 

      I do not have any major comments on the manuscript. By default, I feel descriptive studies are a critical part of the advancement of science, particularly if the data are of great quality - as is the case here. The manuscript addresses various topics and describes these adequately. My minor point would be that in some sections, it feels like one could have gone a bit deeper. I highlighted three examples in the weakness section above (deeper analysis of markers for germline; modes of oogenesis/spermatogenesis; or proposed model for sperm storage). For instance, ultrastructural data might have been informative. But as said, I don't see this as a major problem, more a "would have been nice to see".

      We have responded to these points in detail above.

    1. eLife Assessment

      This is a valuable manuscript that reframes Gaucher's disease pathology through the analysis of renal health, using a Drosophila model mutant for glucocerebrosidase (GBA1). The authors provide physiological and cellular data showing that renal dysfunction may be a critical disease-modifying feature. This work broadens the field's focus beyond the nervous system to include systemic ionic regulation as a potential contributor to disease initiation and progression. The genetic and experimental approaches are solid and offer a rationale for investigating analogous dysfunction in human tissues; however, several claims extend beyond the presented evidence and would benefit from additional experimental support to fully support the conclusions.

    2. Reviewer #1 (Public review):

      This study investigates the contribution of renal dysfunction to systemic and neuronal decline in Drosophila models of Gaucher disease (Gba1b mutants) and Parkinson's disease (Parkin mutants). While lysosomal and mitochondrial pathways are known drivers in these disorders, the role of kidney-like tissues in disease progression has not been well explored.

      The authors use Drosophila melanogaster to model renal dysfunction, focusing on Malpighian tubules (analogous to renal tubules) and nephrocytes (analogous to podocytes). They employ genetic mutants, tissue-specific rescues, imaging of renal architecture, redox probes, functional assays, nephrocyte dextran uptake, and lifespan analyses. They also test genetic antioxidant interventions and pharmacological treatment.

      The main findings show that renal pathology is progressive in Gba1b mutants, marked by Malpighian tubule disorganization, stellate cell loss, lipid accumulation, impaired water and ion regulation, and reduced nephrocyte filtration. A central theme is redox dyshomeostasis, reflected in whole-fly GSH reduction, paradoxical mitochondrial versus cytosolic redox shifts, reduced ROS signals, increased lipid peroxidation, and peroxisomal impairment. Antioxidant manipulations (Nrf2, Sod1/2, CatA, and ascorbic acid) consistently worsen outcomes, suggesting a fragile redox balance rather than classical oxidative stress. Parkin mutants also develop renal degeneration, with impaired mitophagy and complete nephrocyte dysfunction by 28 days, but their mechanism diverges from that of Gba1b. Rapamycin treatment rescues several renal phenotypes in Gba1b but not in Parkin, highlighting distinct disease pathways.

      The authors propose that renal dysfunction is a central disease-modifying feature of Gaucher and Parkinson's disease models, driven by redox imbalance and differential engagement of lysosomal (Gba1b) vs. mitochondrial (Parkin) mechanisms. They suggest that maintaining renal health and redox balance may represent therapeutic opportunities and biomarkers in neurodegenerative disease. This is a significant manuscript that reframes GD/PD pathology through the lens of renal health. The data are extensive. However, several claims are ahead of the evidence and should be supported with additional experiments.

      Major Comments:

      (1) The abstract frames progressive renal dysfunction as a "central, disease-modifying feature" in both Gba1b and Parkin models, with systemic consequences including water retention, ionic hypersensitivity, and worsened neuro phenotypes. While the data demonstrates renal degeneration and associated physiological stress, the causal contribution of renal defects versus broader organismal frailty is not fully disentangled. Please consider adding causal experiments (e.g., temporally restricted renal rescue/knockdown) to directly establish kidney-specific contributions.

      (2) The manuscript shows multiple redox abnormalities in Gba1b mutants (reduced whole fly GSH, paradoxical mitochondrial reduction with cytosolic oxidation, decreased DHE, increased lipid peroxidation, and reduced peroxisome density/Sod1 mislocalization). These findings support a state of redox imbalance, but the driving mechanism remains broad in the current form. It is unclear if the dominant driver is impaired glutathione handling or peroxisomal antioxidant/β-oxidation deficits or lipid peroxidation-driven toxicity, or reduced metabolic flux/ETC activity. I suggest adding targeted readouts to narrow the mechanism.

      (3) The observation that broad antioxidant manipulations (Nrf2 overexpression in tubules, Sod1/Sod2/CatA overexpression, and ascorbic acid supplementation) consistently shorten lifespan or exacerbate phenotypes in Gba1b mutants is striking and supports the idea of redox fragility. However, these interventions are broad. Nrf2 influences proteostasis and metabolism beyond redox regulation, and Sod1/Sod2/CatA may affect multiple cellular compartments. In the absence of dose-response testing or controls for potential off-target effects, the interpretation that these outcomes specifically reflect redox dyshomeostasis feels ahead of the data. I suggest incorporating narrower interpretations (e.g., targeting lipid peroxidation directly) to clarify which redox axis is driving the vulnerability.

      (4) This manuscript concludes that nephrocyte dysfunction does not exacerbate brain pathology. This inference currently rests on a limited set of readouts: dextran uptake and hemolymph protein as renal markers, lifespan as a systemic measure, and two brain endpoints (LysoTracker staining and FK2 polyubiquitin accumulation). While these data suggest that nephrocyte loss alone does not amplify lysosomal or ubiquitin stress, they may not fully capture neuronal function and vulnerability. To strengthen this conclusion, the authors could consider adding functional or behavioral assays (e.g., locomotor performance)

      (5) The manuscript does a strong job of contrasting Parkin and Gba1b mutants, showing impaired mitophagy in Malpighian tubules, complete nephrocyte dysfunction by day 28, FRUMS clearance defects, and partial rescue with tubule-specific Parkin re-expression. These findings clearly separate mitochondrial quality control defects from the lysosomal axis of Gba1b. However, the mechanistic contrast remains incomplete. Many of the redox and peroxisomal assays are only presented for Gba1b. Including matched readouts across both models (e.g., lipid peroxidation, peroxisome density/function, Grx1-roGFP2 compartmental redox status) would make the comparison more balanced and strengthen the conclusion that these represent distinct pathogenic routes.

      (6) Rapamycin treatment is shown to rescue several renal phenotypes in Gba1b mutants (water retention, RSC proliferation, FRUMS clearance, lipid peroxidation) but not in Parkin, and mitophagy is not restored in Gba1b. This provides strong evidence that the two models engage distinct pathogenic pathways. However, the therapeutic interpretation feels somewhat overstated. Human relevance should be framed more cautiously, and the conclusions would be stronger with mechanistic markers of autophagy (e.g., Atg8a, Ref(2)p flux in Malpighian tubules) or with experiments varying dose, timing, and duration (short-course vs chronic rapamycin).

      (7) Several systemic readouts used to support renal dysfunction (FRUMS clearance, salt stress survival) could also be influenced by general organismal frailty. To ensure these phenotypes are kidney-intrinsic, it would be helpful to include controls such as tissue-specific genetic rescue in Malpighian tubules or nephrocytes, or timing rescue interventions before overt systemic decline. This would strengthen the causal link between renal impairment and the observed systemic phenotypes.

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, the authors tested renal function in Gba1b-/- flies and its possible effect on neurodegeneration. They showed that these flies exhibit progressive degeneration of the renal system, loss of water homeostasis, and ionic hypersensitivity. They documented reduced glomerular filtration capacity in their pericardial nephrocytes, together with cellular degeneration in microtubules, redox imbalance, and lipid accumulation. They also compared the Gba1b mutant flies to Parkin mutants and evaluated the effect of treatment with the mTOR inhibitor rapamycin. Restoration of renal structure and function was observed only in the Gba1b mutant flies, leading the authors to conclude that the mutants present different phenotypes due to lysosomal stress in Gba1b mutants versus mitochondrial stress in Parkin mutant flies.

      Comments:

      (1) The authors claim that: "renal system dysfunction negatively impacts both organismal and neuronal health in Gba1b-/- flies, including autophagic-lysosomal status in the brain." This statement implies that renal impairments drive neurodegeneration. However, there is no direct evidence provided linking renal defects to neurodegeneration in this model. It is worth noting that Gba1b-/- flies are a model for neuronopathic Gaucher disease (GD): they accumulate lipids in their brains and present with neurodegeneration and decreased survival, as shown by Kinghorn et al. (The Journal of Neuroscience, 2016, 36, 11654-11670) and by others, which the authors failed to mention (Davis et al., PLoS Genet. 2016, 12: e1005944; Cabasso et al., J Clin Med. 2019, 8:1420; Kawasaki et al., Gene, 2017, 614:49-55).

      (2) The authors tested brain pathology in two experiments:

      (a) To determine the consequences of abnormal nephrocyte function on brain health, they measured lysosomal area in the brain of Gba1b-/-, Klf15LOF, or stained for polyubiquitin. Klf15 is expressed in nephrocytes and is required for their differentiation. There was no additive effect on the increased lysosomal volume (Figure 3D) or polyubiquitin accumulation (Figure 3E) seen in Gba1b-/- fly brains, implying that loss of nephrocyte viability itself does not exacerbate brain pathology.

      (b) The authors tested the consequences of overexpression of the antioxidant regulator Nrf2 in principal cells of the kidney on neuronal health in Gba1b-/- flies, using the c42-GAL4 driver. They claim that "This intervention led to a significant increase in lysosomal puncta number, as assessed by LysoTrackerTM staining (Figure 5D), and exacerbated protein dyshomeostasis, as indicated by polyubiquitin accumulation and increased levels of the ubiquitin-autophagosome trafficker Ref(2)p/p62 in Gba1b-/- fly brains (Figure 5E). Interestingly, Nrf2 overexpression had no significant effect on lysosomal area or ubiquitin puncta in control brains, demonstrating that the antioxidant response specifically in Gba1b-/- flies negatively impacts disease states in the brain and renal system."<br /> Notably, c42-GAL4 is a leaky driver, expressed in salivary glands, Malpighian tubules, and pericardial cells (Beyenbach et al., Am. J. Cell Physiol. 318: C1107-C1122, 2020). Expression in pericardial cells may affect heart function, which could explain deterioration in brain function.

      Taken together, the contribution of renal dysfunction to brain health remains debatable.

      Based on the above, I believe the title should be changed to: Redox Dyshomeostasis Links Renal and Neuronal Dysfunction in Drosophila Models of Gaucher disease. Such a title will reflect the results presented in the manuscript.

      (3) The authors mention that Gba1b is not expressed in the renal system, which means that no renal phenotype can be attributed directly to any known GD pathology. They suggest that systemic factors such as circulating glycosphingolipids or loss of extracellular vesicle-mediated delivery of GCase may mediate renal toxicity. This raises a question about the validity of this model to test pathology in the fly kidney. According to Flybase, there is expression of Gba1b in renal structures of the fly.

      (4) It is worth mentioning that renal defects are not commonly observed in patients with Gaucher disease. Relevant literature: Becker-Cohen et al., A Comprehensive Assessment of Renal Function in Patients With Gaucher Disease, J. Kidney Diseases, 2005, 46:837-844.

      (5) In the discussion, the authors state: "Together, these findings establish renal degeneration as a driver of systemic decline in Drosophila models of GD and PD..." and go on to discuss a brain-kidney axis in PD. However, since this study investigates a GD model rather than a PD model, I recommend omitting this paragraph, as the connection to PD is speculative and not supported by the presented data.

      (6) The claim: "If confirmed, our findings could inform new biomarker strategies and therapeutic targets for GBA1 mutation carriers and other at-risk groups. Maintaining renal health may represent a modifiable axis of intervention in neurodegenerative disease," extends beyond the scope of the experimental evidence. The authors should consider tempering this statement or providing supporting data.

      (7) The conclusion, "we uncover a critical and previously overlooked role for the renal system in GD and PD pathogenesis," is too strong given the data presented. As no mechanistic link between renal dysfunction and neurodegeneration has been established, this claim should be moderated.

      (8) The relevance of Parkin mutant flies is questionable, and this section could be removed from the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Hull et al examine Drosophila mutants for the Gaucher's disease locus GBA1/Gba1b, a locus that, when heterozygous, is a risk factor for Parkinson's. Focusing on the Malpighian tubules and their function, they identify a breakdown of cell junctions, loss of haemolymph filtration, sensitivity to ionic imbalance, water retention, and loss of endocytic function in nephrocytes. There is also an imbalance in ROS levels between the cytoplasm and mitochondria, with reduced glutathione levels, rescue of which could not improve longevity. They observe some of the same phenotypes in mutants of Parkin, but treatment by upregulation of autophagy via rapamycin feeding could only rescue the Gba1b mutant and not the Parkin mutant.

      Strengths:

      The paper uses a range of cellular, genetic, and physiological analyses and manipulations to fully describe the renal dysfunction in the GBa1b animals. The picture developed has depth and detail; the data appears sound and thorough.

      Weaknesses:

      The paper relies mostly on the biallelic Gba1b mutant, which may reflect dysfunction in Gaucher's patients, though this has yet to be fully explored. The claims for the heterozygous allele and a role in Parkinson's is a little more tenuous, making assumptions that heterozygosity is a similar but milder phenotype than the full loss-of-function.

    5. Author response:

      Reviewer #1 (Public review):

      Major Comments:

      (1) The abstract frames progressive renal dysfunction as a "central, disease-modifying feature" in both Gba1b and Parkin models, with systemic consequences including water retention, ionic hypersensitivity, and worsened neuro phenotypes. While the data demonstrates renal degeneration and associated physiological stress, the causal contribution of renal defects versus broader organismal frailty is not fully disentangled. Please consider adding causal experiments (e.g., temporally restricted renal rescue/knockdown) to directly establish kidney-specific contributions.

      We concur that this would help strengthen our conclusions. However, manipulating Gba1b in a tissue-specific manner remains challenging due to its propensity for secretion via extracellular vesicles (ECVs). Leo Pallanck and Marie Davis have elegantly shown that ectopic Gba1b expression in neurons and muscles (tissues with low predicted endogenous expression) is sufficient to rescue major organismal phenotypes. Consistent with this, we have been unable to generate clear tissue-specific phenotypes using Gba1b RNAi.

      We will pursue more detailed time-course experiments of the progression of renal pathology, (water weight, renal stem cell proliferation, redox defects, etc.) with the goal of identifying earlier-onset phenotypes that potentially drive dysfunction.

      (2) The manuscript shows multiple redox abnormalities in Gba1b mutants (reduced whole fly GSH, paradoxical mitochondrial reduction with cytosolic oxidation, decreased DHE, increased lipid peroxidation, and reduced peroxisome density/Sod1 mislocalization). These findings support a state of redox imbalance, but the driving mechanism remains broad in the current form. It is unclear if the dominant driver is impaired glutathione handling or peroxisomal antioxidant/β-oxidation deficits or lipid peroxidation-driven toxicity, or reduced metabolic flux/ETC activity. I suggest adding targeted readouts to narrow the mechanism.

      We agree that we have not yet established a core driver of redox imbalance. Identifying one is likely to be challenging, especially as our RNA-sequencing data from aged Gba1b<sup>⁻/⁻</sup> fly heads (Atilano et al., 2023) indicate that several glutathione S-transferases (GstD2, GstD5, GstD8, and GstD9) are upregulated. We can attempt overexpression of GSTs, which has been elegantly shown by Leo Pallanck to ameliorate pathology in Pink1/Parkin mutant fly brains. However, mechanisms that specifically suppress lipid peroxidation or its associated toxicity, independently of other forms of redox damage, remain poorly understood in Drosophila. Our position is there probably will not be one dominant driver of redox imbalance. Notably, CytB5 overexpression has been shown to reduce lipid peroxidation (Chen et al., 2017), and GstS1 has been reported to conjugate glutathione to the toxic lipid peroxidation product 4-HNE (Singh et al., 2001). Additionally, work from the Bellen lab demonstrated that overexpression of lipases, bmm or lip4, suppresses lipid peroxidation-mediated neurodegeneration (Liu et al., 2015). We will therefore test the effects of over-expressing CytB5, bmm and lip4 in Gba1b<sup>⁻/⁻</sup> flies to help further define the mechanism.

      (3) The observation that broad antioxidant manipulations (Nrf2 overexpression in tubules, Sod1/Sod2/CatA overexpression, and ascorbic acid supplementation) consistently shorten lifespan or exacerbate phenotypes in Gba1b mutants is striking and supports the idea of redox fragility. However, these interventions are broad. Nrf2 influences proteostasis and metabolism beyond redox regulation, and Sod1/Sod2/CatA may affect multiple cellular compartments. In the absence of dose-response testing or controls for potential off-target effects, the interpretation that these outcomes specifically reflect redox dyshomeostasis feels ahead of the data. I suggest incorporating narrower interpretations (e.g., targeting lipid peroxidation directly) to clarify which redox axis is driving the vulnerability.

      We are in agreement that Drosophila Cnc exhibits functional conservation with both Nrf1 and Nrf2, which have well-established roles in proteostasis and lysosomal biology that may exacerbate pre-existing lysosomal defects in Gba1b mutants. In our manuscript, Nrf2 manipulation forms part of a broader framework of evidence, including dietary antioxidant ascorbic acid and established antioxidant effectors CatA, Sod1, and Sod2. Together, these data indicate that Gba1b mutant flies display a deleterious response to antioxidant treatments or manipulations. To further characterise the redox state, we will quantify lipid peroxidation using Bodipy 581/591 and assess superoxide levels via DHE staining under our redox-altering experimental conditions.

      As noted above, we will attempt to modulate lipid peroxidation directly through CytB5 and GstS1 overexpression, acknowledging the caveat that this approach may not fully dissociate lipid peroxidation from other aspects of redox stress. We have also observed detrimental effects of PGC1α on the lifespan of Gba1b<sup>⁻/⁻</sup> flies and will further investigate its impact on redox status in the renal tubules.

      (4) This manuscript concludes that nephrocyte dysfunction does not exacerbate brain pathology. This inference currently rests on a limited set of readouts: dextran uptake and hemolymph protein as renal markers, lifespan as a systemic measure, and two brain endpoints (LysoTracker staining and FK2 polyubiquitin accumulation). While these data suggest that nephrocyte loss alone does not amplify lysosomal or ubiquitin stress, they may not fully capture neuronal function and vulnerability. To strengthen this conclusion, the authors could consider adding functional or behavioral assays (e.g., locomotor performance)

      We will address this suggestion by performing DAM activity assays and climbing assays in the Klf15; Gba1b<sup>⁻/⁻</sup> double mutants.

      (5) The manuscript does a strong job of contrasting Parkin and Gba1b mutants, showing impaired mitophagy in Malpighian tubules, complete nephrocyte dysfunction by day 28, FRUMS clearance defects, and partial rescue with tubule-specific Parkin re-expression. These findings clearly separate mitochondrial quality control defects from the lysosomal axis of Gba1b. However, the mechanistic contrast remains incomplete. Many of the redox and peroxisomal assays are only presented for Gba1b. Including matched readouts across both models (e.g., lipid peroxidation, peroxisome density/function, Grx1-roGFP2 compartmental redox status) would make the comparison more balanced and strengthen the conclusion that these represent distinct pathogenic routes.

      We agree that park<sup>⁻/⁻</sup> mutants have been characterised in greater detail than park<sup>⁻/⁻</sup>. The primary aim of our study was not to provide an exhaustive characterisation of park¹/¹, but rather to compare key shared and distinct mechanisms underlying renal dysfunction. We have included several relevant readouts for park<sup>⁻/⁻</sup> tubules (e.g., Figure 7D and 8H: mito-Grx1-roGFP2; Figure 8J: lipid peroxidation using BODIPY 581/591). To expand our characterisation of park¹/¹ flies, we will express the cytosolic Grx1 reporter and the peroxisomal marker YFP::Pts.

      (6) Rapamycin treatment is shown to rescue several renal phenotypes in Gba1b mutants (water retention, RSC proliferation, FRUMS clearance, lipid peroxidation) but not in Parkin, and mitophagy is not restored in Gba1b. This provides strong evidence that the two models engage distinct pathogenic pathways. However, the therapeutic interpretation feels somewhat overstated. Human relevance should be framed more cautiously, and the conclusions would be stronger with mechanistic markers of autophagy (e.g., Atg8a, Ref(2)p flux in Malpighian tubules) or with experiments varying dose, timing, and duration (short-course vs chronic rapamycin).

      We will measure Atg8a, polyubiquitin, and Ref(2)P levels in Gba1b<sup>⁻/⁻</sup> and park<sup>¹/¹</sup> tubules following rapamycin treatment. In our previous study focusing on the gut (Atilano et al., 2023), we showed that rapamycin treatment increased lysosomal area, as assessed using LysoTracker<sup>TM</sup>. We will extend this analysis to the renal tubules following rapamycin exposure. Another reviewer requested that we adopt more cautious language regarding the clinical translatability of this work, and we will amend this in Version 2.

      (7) Several systemic readouts used to support renal dysfunction (FRUMS clearance, salt stress survival) could also be influenced by general organismal frailty. To ensure these phenotypes are kidney-intrinsic, it would be helpful to include controls such as tissue-specific genetic rescue in Malpighian tubules or nephrocytes, or timing rescue interventions before overt systemic decline. This would strengthen the causal link between renal impairment and the observed systemic phenotypes.

      As noted in our response to point 1, we currently lack reliable approaches to manipulate Gba1b in a tissue-specific manner. However, we agree that it is important to distinguish kidney-intrinsic dysfunction from generalised organismal frailty. In the park model, we have already performed renal cell-autonomous rescue: re-expression of Park specifically in Malpighian tubule principal cells (C42-Gal4) throughout adulthood partially normalises water retention, whereas brain-restricted Park expression has no effect on renal phenotypes. Because rescuing Park only in the renal tubules is sufficient to correct a systemic fluid-handling phenotype in otherwise mutant animals, these findings indicate that the systemic defects are driven, at least in part, by renal dysfunction rather than nonspecific organismal frailty.

      To strengthen this causal link, we will now extend this same tubule-specific Park rescue (C42-Gal4 and the high-fidelity Malpighian tubule driver CG31272-Gal4) to additional systemic readouts raised by the reviewer. Specifically, we will assay FRUMS clearance and salt stress survival in rescued versus non-rescued park mutants to determine whether renal rescue also mitigates these systemic phenotypes.

      Reviewer #2 (Public review):

      (1) The authors claim that: "renal system dysfunction negatively impacts both organismal and neuronal health in Gba1b-/- flies, including autophagic-lysosomal status in the brain." This statement implies that renal impairments drive neurodegeneration. However, there is no direct evidence provided linking renal defects to neurodegeneration in this model. It is worth noting that Gba1b-/- flies are a model for neuronopathic Gaucher disease (GD): they accumulate lipids in their brains and present with neurodegeneration and decreased survival, as shown by Kinghorn et al. (The Journal of Neuroscience, 2016, 36, 11654-11670) and by others, which the authors failed to mention (Davis et al., PLoS Genet. 2016, 12: e1005944; Cabasso et al., J Clin Med. 2019, 8:1420; Kawasaki et al., Gene, 2017, 614:49-55).

      With the caveats noted in the responses below, we show that driving Nrf2 expression using the renal tubular driver C42 results in decreased survival, more extensive renal defects, and increased brain pathology in Gba1b<sup>⁻/⁻</sup> flies, but not in healthy controls. This suggests that a healthy brain can tolerate renal dysfunction without severe pathological consequences. Our findings therefore indicate that in Gba1b<sup>⁻/⁻</sup> flies, there may be an interaction between renal defects and brain pathology. We do not explicitly claim that renal impairments drive neurodegeneration; rather, we propose that manipulations exacerbating renal dysfunction can have organism-wide effects, ultimately impacting the brain.

      The reviewer is correct that our Gba1b<sup>⁻/⁻</sup> fly model represents a neuronopathic GD model with age-related pathology. Indeed, we reproduce the autophagic-lysosomal defects previously reported (Kinghorn et al., 2016) in Figure 5. We agree that the papers cited by the reviewer merit inclusion, and in Version 2 we will incorporate them into the following pre-existing sentence in the Results:

      “The gut and brain of Gba1b<sup>⁻/⁻</sup> flies, similar to macrophages in GD patients, are characterised by enlarged lysosomes (Kinghorn et al., 2016; Atilano et al., 2023).”

      (2) The authors tested brain pathology in two experiments:

      (a) To determine the consequences of abnormal nephrocyte function on brain health, they measured lysosomal area in the brain of Gba1b-/-, Klf15LOF, or stained for polyubiquitin. Klf15 is expressed in nephrocytes and is required for their differentiation. There was no additive effect on the increased lysosomal volume (Figure 3D) or polyubiquitin accumulation (Figure 3E) seen in Gba1b-/- fly brains, implying that loss of nephrocyte viability itself does not exacerbate brain pathology.

      (b) The authors tested the consequences of overexpression of the antioxidant regulator Nrf2 in principal cells of the kidney on neuronal health in Gba1b-/- flies, using the c42-GAL4 driver. They claim that "This intervention led to a significant increase in lysosomal puncta number, as assessed by LysoTrackerTM staining (Figure 5D), and exacerbated protein dyshomeostasis, as indicated by polyubiquitin accumulation and increased levels of the ubiquitin-autophagosome trafficker Ref(2)p/p62 in Gba1b-/- fly brains (Figure 5E). Interestingly, Nrf2 overexpression had no significant effect on lysosomal area or ubiquitin puncta in control brains, demonstrating that the antioxidant response specifically in Gba1b-/- flies negatively impacts disease states in the brain and renal system."Notably, c42-GAL4 is a leaky driver, expressed in salivary glands, Malpighian tubules, and pericardial cells (Beyenbach et al., Am. J. Cell Physiol. 318: C1107-C1122, 2020). Expression in pericardial cells may affect heart function, which could explain deterioration in brain function.

      Taken together, the contribution of renal dysfunction to brain health remains debatable.

      Based on the above, I believe the title should be changed to: Redox Dyshomeostasis Links Renal and Neuronal Dysfunction in Drosophila Models of Gaucher disease. Such a title will reflect the results presented in the manuscript

      We agree that C42-Gal4 is a leaky driver; unfortunately, this was true for all commonly used Malpighian tubule drivers available when we began the study. A colleague has recommended CG31272-Gal4 from the Perrimon lab’s recent publication (Xu et al., 2024) as a high-fidelity Malpighian tubule driver. If it proves to maintain principal-cell specificity throughout ageing in our hands, we will repeat key experiments using this driver.

      (3) The authors mention that Gba1b is not expressed in the renal system, which means that no renal phenotype can be attributed directly to any known GD pathology. They suggest that systemic factors such as circulating glycosphingolipids or loss of extracellular vesicle-mediated delivery of GCase may mediate renal toxicity. This raises a question about the validity of this model to test pathology in the fly kidney. According to Flybase, there is expression of Gba1b in renal structures of the fly.

      Our evidence suggesting that Gba1b is not substantially expressed in renal tissue is based on use of the Gba1b-CRIMIC-Gal4 line, which fails to drive expression of fluorescently tagged proteins in the Malpighian tubules and we have previously shown there is no expression within the nephrocytes with this driver line (Atilano et al., 2023). This does not exclude the possibility that Gba1b functions within the tubules. Notably, Leo Pallanck has provided compelling evidence that Gba1b is present in extracellular vesicles (ECVs) and given the role of the Malpighian tubules in haemolymph filtration, these cells are likely exposed to circulating ECVs. The lysosomal defects observed in Gba1b<sup>⁻/⁻</sup> tubules therefore suggest a potential role for Gba1b in this tissue.  

      John Vaughan and Thomas Clandinin have developed mCherry- and Lamp1.V5-tagged Gba1b constructs. We intend to express these in tissues shown by the Pallanck lab to release ECVs (e.g., neurons and muscle) and examine whether the protein can be detected in the tubules.

      (4) It is worth mentioning that renal defects are not commonly observed in patients with Gaucher disease. Relevant literature: Becker-Cohen et al., A Comprehensive Assessment of Renal Function in Patients With Gaucher Disease, J. Kidney Diseases, 2005, 46:837-844.

      We have identified five references indicating that renal involvement, while rare, does occur in association with GD. We agree that this is a valid citation and will include it in the revised introductory sentence:

      “However, renal dysfunction remains a rare symptom in GD patients (Smith et al., 1978; Chander et al., 1979; Siegel et al., 1981; Halevi et al., 1993).”

      (5) In the discussion, the authors state: "Together, these findings establish renal degeneration as a driver of systemic decline in Drosophila models of GD and PD..." and go on to discuss a brain-kidney axis in PD. However, since this study investigates a GD model rather than a PD model, I recommend omitting this paragraph, as the connection to PD is speculative and not supported by the presented data.

      Our position is that Gba1b<sup>⁻/⁻</sup> represents a neuronopathic Gaucher disease model with mechanistic relevance to PD. The severity of GBA1 mutations correlates with the extent of GBA1/GCase loss of function and, consequently, with increased PD risk. Likewise, biallelic park<sup>⁻/⁻</sup> mutants cause a severe and heritable form of PD, and the Drosophila park<sup>⁻/⁻</sup> model is a well-established and widely recognised system that has been instrumental in elucidating how Parkin and Pink1 mutations drive PD pathogenesis.

      We therefore see no reason to omit this paragraph. While some aspects are inherently speculative, such discussion is appropriate and valuable when addressing mechanisms underlying a complex and incompletely understood disease, provided interpretations remain measured. At no point do we claim that our work demonstrates a direct brain-renal axis. Rather, our data indicate that renal dysfunction is a disease-modifying feature in these models, aligning with emerging epidemiological evidence linking PD and renal impairment.

      (6) The claim: "If confirmed, our findings could inform new biomarker strategies and therapeutic targets for GBA1 mutation carriers and other at-risk groups. Maintaining renal health may represent a modifiable axis of intervention in neurodegenerative disease," extends beyond the scope of the experimental evidence. The authors should consider tempering this statement or providing supporting data.

      (7) The conclusion, "we uncover a critical and previously overlooked role for the renal system in GD and PD pathogenesis," is too strong given the data presented. As no mechanistic link between renal dysfunction and neurodegeneration has been established, this claim should be moderated.

      We agree that these sections may currently overstate our findings. In Version 2, we will revise them to ensure our claims remain balanced, while retaining the key points that arise from our data and clearly indicating where conclusions require confirmation (“if confirmed”) or additional study (“warrants further investigation”).

      “If confirmed, our findings could inform new biomarker strategies and therapeutic targets for patients with GD and PD. Maintaining renal health may represent a modifiable axis of intervention in these diseases.”

      “We uncover a notable and previously underappreciated role for the renal system in GD and PD, which now warrants further investigation.”

      (8) The relevance of Parkin mutant flies is questionable, and this section could be removed from the manuscript.

      We intend to include the data for the Parkin loss-of-function mutants, as these provide essential support for the PD-related findings discussed in our manuscript. To our knowledge, this represents the first demonstration that Parkin mutants display defects in Malpighian tubule function and water homeostasis. We therefore see no reason to remove these findings. Furthermore, as Reviewer 1 specifically requested additional experiments using the Park fly model, we plan to incorporate these analyses in the revised manuscript.

      Minor comments:

      (1)  Figure 1G: The FRUMS assay is not shown for Gba1b-/- flies.

      The images in Figure 1G illustrate representative stages of dye clearance. We have quantified the clearance time course for both genotypes. During this process, the tubules of Gba1b<sup>⁻/⁻</sup> flies, similar to controls, sequentially resemble each of the three example images. As the Gba1b<sup>⁻/⁻</sup> tubules appear morphologically identical to controls, differing only in population-level clearance dynamics, we do not feel that including additional example images would provide further informative value.

      (2) In panels D and F of Figure 2, survival of control and Gba1b-/- flies in the presence of 4% NaCl is presented. However, longevity is different (up to 10 days in D and ~3 days in F for control). The authors should explain this.

      We agree. In our experience, feeding-based stress survival assays show considerable variability between experiments, and we therefore interpret results only within individual experimental replicates. We have observed similar variability in oxidative stress, starvation, and xenobiotic survival assays, which may reflect batch-specific or environmental effects.

      (3) In Figure 7F, the representative image does not correspond to the quantification; the percentage of endosome-negative nephrocytes seems to be higher for the control than for the park1/1 flies. Please check this.

      The example images are correctly oriented. Typically, an endosome-negative nephrocyte shows no dextran uptake, whereas an endosome-positive nephrocyte displays a ring of puncta around the cell periphery. In park¹/¹ mutants, dysfunctional nephrocytes exhibit diffuse dextran staining throughout the cell, accompanied by diffuse DAPI signal, indicating a complete loss of membrane integrity and likely cell death. We have 63× images from the preparations shown in Figure 7F demonstrating this. In Version 2, we will include apical and medial z-slices of the nephrocytes to illustrate these findings (to be added as supplementary   data).

      (4) In Figure 7H, the significance between control and park1/1 flies in the FRUMS assay is missing.

      We observe significant dye clearance from the haemolymph; however, the difference in complete clearance from the tubules does not reach statistical significance. This may speculatively reflect alterations in specific aspects of tubule function, where absorption and transcellular flux are affected, but subsequent clearance from the tubule lumen remains intact. We do not feel that our current data provide sufficient resolution to draw detailed conclusions about tubule physiology at this level.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper relies mostly on the biallelic Gba1b mutant, which may reflect dysfunction in Gaucher's patients, though this has yet to be fully explored. The claims for the heterozygous allele and a role in Parkinson's is a little more tenuous, making assumptions that heterozygosity is a similar but milder phenotype than the full loss-of-function.

      We agree with the reviewer that studying heterozygotes may provide valuable insight into GBA1-associated PD. We will therefore assess whether subtle renal defects are detectable in Gba1b<sup>⁻/⁻</sup> heterozygotes. We clearly state that GBA1 mutations act as a risk factor for PD rather than a Mendelian inherited cause. Consistent with findings from Gba heterozygous mice, Gba1b<sup>⁻/⁻</sup> flies display minimal phenotypes (Kinghorn et al. 2016), and any observable effects are expected to be very mild and age dependent.

      (1) Figure 1c, the loss of stellate cells. What age are the MTs shown? Is this progressive or developmental?

      These experiments were conducted on flies that were three weeks of age, as were all manipulations unless otherwise stated. We will ensure that this information is clearly indicated in the figure legends in Version 2. We did not observe changes in stellate cell number at three days of age, and this result will be included in the supplementary material in Version 2. Our data therefore suggest that this is a progressive phenotype.

      (2) I might have missed this, but for Figure 3, do the mutant flies start with a similar average weight, or are they bloated?

      We will perform an age-related time course of water weight in response to Reviewer 1’s comments. For all experiments, fly eggs are age-matched and seeded below saturation density to ensure standardised conditions. Gba1b mutant flies do not exhibit any defects in body size or timing of eclosion.

      (3) On 2F, add to the graph that 4% NaCl (or if it is KCL) is present for all conditions, just to make the image self-sufficient to read.

      Many thanks for the suggestion. We agree that this will increase clarity and will make this amendment in Version 2 of the manuscript

      (4) P13 - rephrase, 'target to either the mitochondria or the cytosol' (as it is phrased, it sounds as though you are doing both at the same time).

      We agree and we plan to revise the sentence as follows:

      Original:

      “To further evaluate the glutathione redox potential (E<sub>GSH</sub>) in MTs, we utilised the redox-sensitive green, fluorescent biosensor Grx1-roGFP2, targeted to both the mitochondria and cytosol (Albrecht et al., 2011).”

      Revised:

      “To further evaluate the glutathione redox potential (E<sub>GSH</sub>) in MTs, we utilised the redox-sensitive fluorescent biosensor Grx1-roGFP2, targeted specifically to either the mitochondria or the cytosol using mito- or cyto-tags, respectively (Albrecht et al., 2011).”

      (5) In 6F - the staining appears more intense in the Park mutant - perhaps add asterisks or arrowheads to indicate the nephrocytes so that the reader can compare the correct parts of the image?

      Reviewer 2 reached the same interpretation. Typically, an endosome-negative nephrocyte shows no dextran uptake, whereas an endosome-positive nephrocyte displays a ring of puncta around the cell periphery. In park¹/¹ mutants, dysfunctional nephrocytes exhibit diffuse dextran staining throughout the cell, accompanied by diffuse DAPI signal, indicative of a complete loss of membrane integrity and likely cell death. We have 63× images from the preparations shown in Figure 7F demonstrating this, and in Version 2 we will include apical and medial z-slices of the nephrocytes to illustrate these findings (to be added as supplementary data).

      (6) In the main results text - need some description/explanation of the SOD1 v SOD2 distribution (as it is currently understood) in the cell - SOD2 being predominantly mitochondrial. This helps arguments later on.

      Thank you for this suggestion. We plan to amend the text as follows:

      “Given that Nrf2 overexpression shortens lifespan in Gba1b<sup>⁻/⁻</sup> flies, we investigated the effects of overexpressing its downstream antioxidant targets, Sod1, Sod2, and CatA, both ubiquitously using the tub-Gal4 driver and with c42-Gal4, which expresses in PCs.”

      to:

      “Given that Nrf2 overexpression shortens lifespan in Gba1b<sup>⁻/⁻</sup> flies, we investigated the effects of overexpressing its downstream antioxidant targets, Sod1, Sod2, and CatA, both ubiquitously using the tub-Gal4 driver and with c42-Gal4, which expresses in PCs. Sod1 and CatA function primarily in the cytosol and peroxisomes, whereas Sod2 is localised to the mitochondria. Sod1 and Sod2 catalyse the dismutation of superoxide radicals to hydrogen peroxide, while CatA subsequently degrades hydrogen peroxide to water and oxygen.”

      (7) Figure 1G, what age are the flies? Same for 3D and E, 4C,D,E, 5B - please check the ages of flies for all of the imaging figures; this information appears to have been missed out.

      As stated above, all experiments were conducted on three-week-old flies unless otherwise specified. In Version 2 of the manuscript, we will ensure this information is included consistently in the figure legends to prevent any potential confusion.

    1. eLife Assessment

      This work uses enhanced sampling molecular dynamics methods to generate potentially useful information about a conformational change (the DFG flip) that plays a key role in regulating kinase function and inhibitor binding. The focus of the work is on the mechanism of conformational change and how mutations affect the transition. The evidence supporting the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used weighted ensemble enhanced sampling molecular dynamics (MD) to test the hypothesis that a double mutant of Abl favors the DFG-in state relative to the WT and therefore causes the drug resistance to imatinib.

      Strengths:

      The authors employed three novel progress coordinates to sample the DFG flip of ABl. The hypothesis regarding the double mutant's drug resistance is novel.

      Weaknesses:

      The study contains many uncertain aspects. As such, major conclusions do not appear to be supported.

      Comments on revisions:

      The authors have addressed some of my concerns, but these concerns remain to be addressed:

      (1) Definition of the DFG conformation (in vs out). The authors specified their definition in the revised manuscript, but it has not been validated for a large number of kinases to distinguish between the two states. Thus, I recommend that the authors calculate the FES using another definition (see Tsai et al, JACS 2019, 141, 15092−15101) to confirm their findings. This FES can be included in the SI.

      (2) There is no comparison to previous computational work. I would like to see a comparison between the authors' finding of the DFG-in to DFG-out transition and that described in Tsai et al, JACS 2019, 141, 15092−15101.

      (3) My previous comment: "The study is not very rigorous. The major conclusions do not appear to be supported. The claim that it is the first unbiased simulation to observe DFG flip is not true. For example, Hanson, Chodera et al (Cell Chem Biol 2019), Paul, Roux et al (JCTC 2020), and Tsai, Shen et al (JACS 2019) have also observed the DFG flip." has not been adequately addressed.

      The newly added paragraph clearly does not address my original comment.

      "Through our work, we have simulated an ensemble of DFG flip pathways in a wild-type kinase and its variants with atomistic resolution and without the use of biasing forces, also reporting the effects of inhibitor-resistant mutations in the broader context of kinase inactivation likelihood with such level of detail. "

      (4) My previous comment, "Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated." has not been addressed.

      In the authors's response stated:

      According to previous publications, DFG-Asp is frequently protonated in the DFG-in state of Abl1 kinase. For instance, as quoted from Hanson, Chodera, et al., Cell Chem Bio (2019), "Consistent with previous simulations on the DFG-Asp-out/in interconversion of Abl kinase we only observe the DFG flip with protonated Asp747 ( Shan et al., 2009 ). We showed previously that the pKa for the DFG-Asp in Abl is elevated at 6.5."

      Since the pKa of DFG-Asp is 6.5, it should be deprotonated at the physiological pH 7.5. Thus, the fact that the authors used protonated DFG-Asp contradicts this. I am not requesting the authors to redo the entire simulations, but they need to acknowledge this discrepancy and add a brief discussion. See a constant pH study that demonstrates the protonation state population shift for DFG-Asp as the DFG transitions from in to out state (see Tsai et al, JACS 2019, 141, 15092−15101).

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript on the mechanism of the DFG flip in kinases. This conformational change is important for the toggling of kinases between active (DFG-in) and inactive (DFG-out) states. The relative probabilities of these two states are also an important determinant of the affinity of inhibitors for a kinase. However, it is an extremely slow/rare conformational change, making it difficult to capture in simulations. The authors show that weighted ensemble simulations can capture the DFG flip and then delve into the mechanism of this conformational change and the effects of mutations.

      Strengths:

      The DFG flip is very hard to capture in simulations. Showing that this can be done with relatively little simulation by using enhanced sampling is a valuable contribution. The manuscript gives a nice description of the background for non-experts.

      Weaknesses:

      The anecdotal approach to presenting the results is disappointing. Molecular processes are stochastic and the authors have expertise in describing such processes. However, they chose to put most statistical analysis in the SI. The main text instead describes the order of events in single "representative" trajectories. The main text makes it sound like these were most selected as they were continuous trajectories from the weighted ensemble simulations. It is preferable to have a description of the highest probability pathway(s) with some quantification of how probable they are. That would give the reader a clear sense of how representative the events described are.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      We thank the reviewer for this suggestion. In the manuscript, we use pseudodihedral and bond angle-based DFG definitions that have been previously established by literature cited in the study (re-iterated below) to unambiguously define the side-chain conformational states of the DFG motif. As we are interested in the specific mechanics of DFG flips under different conditions, we’ve found that the descriptors defined below are sufficient to distinguish between DFG states and allow a more direct comparison with previously-reported results in the literature using different methods.

      We amended the text to be more clear as to those definitions and their choice:

      DFG angle definitions:

      Phe382/Cg, Asp381/OD2, Lys378/O

      Source: Structural Characterization of the Aurora Kinase B "DFG-flip" Using Metadynamics. Lakkaniga NR, Balasubramaniam M, Zhang S, Frett B, Li HY. AAPS J. 2019 Dec 18;22(1):14. doi: 10.1208/s12248-019-0399-6. PMID: 31853739; PMCID: PMC7905835.

      “Finally, we chose the angle formed by Phe382's gamma carbon, Asp381's protonated side chain oxygen (OD2), and Lys378's backbone oxygen as PC3 based on observations from a study that used a similar PC to sample the DFG flip in Aurora Kinase B using metadynamics \cite{Lakkaniga2019}. This angular PC3 should increase or decrease (based on the pathway) during the DFG flip, with peak differences at intermediate DFG configurations, and then revert to its initial state when the flip concludes.”

      DFG pseudodihedral definitions:

      Ala380/Cb, Ala380/Ca, Asp381/Ca, Asp381/Cg

      Ala380/Cb, Ala380/CA, Phe382/CA, Phe382Cg

      Source: Computational Study of the “DFG-Flip” Conformational Transition in c-Abl and c-Src Tyrosine Kinases. Yilin Meng, Yen-lin Lin, and Benoît Roux The Journal of Physical Chemistry B 2015 119 (4), 1443-1456 DOI: 10.1021/jp511792a

      “For downstream analysis, we used two pseudodihedrals previously defined in the existing Abl1 DFG flip simulation literature \cite{Meng2015} to identify and discriminate between DFG states. The first (dihedral 1) tracks the flip state of Asp381, and is formed by the beta carbon of Ala380, the alpha carbon of Ala380, the alpha carbon of Asp381, and the gamma carbon of Asp381. The second (dihedral 2) tracks the flip state of Phe382, and is formed by the beta carbon of Ala380, the alpha carbon of Ala380, the alpha carbon of Phe381, and the gamma carbon of Phe381. These pseudodihedrals, when plotted in relation to each other, clearly distinguish between the initial DFG-in state, the target DFG-out state, and potential intermediate states in which either Asp381 or Phe381 has flipped.”

      Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      We agree that demonstrating convergence is important for accurate estimations of population differences between conformational states. However, as the DFG flip is a complex and concerted conformational change with an energy barrier of 30 kcal/mol [1], and considering the traditional limitations of methods like weighted ensemble molecular dynamics (WEMD), it would take an unrealistic amount of GPU time (months) to observe convergence in our simulations. As discussed in the text (see examples below), we caveat our energy estimations by explicitly mentioning that the state populations we report are not converged and are indicative of a much larger energy barrier in the mutant.

      “These relative probabilities qualitatively agree with the large expected free energy barrier for the DFG-in to DFG-out transition (~32 kcal/mol), and with our observation of a putative metastable DFG-inter state that is missed by NMR experiments due to its low occupancy.”

      “As an important caveat, it is unlikely that the DFG flip free energy barriers of over 70 kcal/mol estimated for the Abl1 drug-resistant variants quantitatively match the expected free energy barrier for their inactivation. Rather, our approximate free energy barriers are a symptom of the markedly increased simulation time required to sample the DFG flip in the variants relative to the wild-type, which is a strong indicator of the drastically reduced propensity of the variants to complete the DFG flip. Although longer WE simulations could allow us to access the timescales necessary for more accurately sampling the free energy barriers associated with the DFG flip in Abl1's drug-resistant compound mutants, the computational expense of running WE for 200 iterations is already large (three weeks with 8 NVIDIA RTX3900 GPUs for one replicate); this poses a logistical barrier to attempting to sample sufficient events to be able to fully characterize how the reaction path and free energy barrier change for the flip associated with the mutations. Regardless, the results of our WE simulations resoundingly show that the Glu255Lys/Val and Thr315Ile compound mutations drastically reduce the probability for DFG flip events in Abl1.”

      (1) Conformational states dynamically populated by a kinase determine its function. Tao Xie et al., Science 370, eabc2754 (2020). DOI:10.1126/science.abc2754

      The DFG flip needs to be sampled several times to establish free energy difference.

      Our simulations have captured thousands of correlated and dozens of uncorrelated DFG flip events. The per-replicate free energy differences are computed based on the correlated transitions. Please consult the WEMD literature (referenced below and in the manuscript, references 34 and 36) for more information on how WEMD allows the sampling of multiple such events and subsequent estimation of probabilities:

      Zuckermann et al (2017) 10.1146/annurev-biophys-070816-033834

      Chong et al (2021) 10.1021/acs.jctc.1c01154

      The free energy plots do not appear to show an intermediate state as claimed.

      Both the free energy plots and the representative/anecdotal trajectories analyzed in the study show a saddle point when Asp381 has flipped but Phe382 has not (which defines the DFG-inter state), we observe a distinct change in probability when going to the pseudodihedral values associated with DFG-inter to DFG-up or DFG-out. We removed references to the putative state S1 as we we agree with the reviewer that its presence is unlikely given the data we show.

      The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      We appreciate this point. To clarify, the 7 ns segments corresponds to a collated trajectory extracted from the tens of thousands of walkers that compose the WEMD ensemble, and represent just the specific moment at which the dihedral flips occur rather than the entire flip process. On average, our WEMD simulations sample over 3 us of aggregate simulation time before the first DFG flip event is observed, in line with a high energy barrier. This is made clear in the manuscript excerpt below: “Over an aggregate simulation time of over 20 $\mu$s, we have collected dozens of uncorrelated and unbiased inactivation events, starting from the lowest energy conformation of the Abl1 kinase core (PDB 6XR6) \cite{Xie2020}.”

      The free energy scale (100 kT) appears to be one order of magnitude too large.

      As discussed in the text and quoted in response to comment 2, the exponential splitting nature of WEMD simulations (where the probability of individual walkers are split upon crossing each bin threshold) often leads to unrealistically high energy barriers for rare events. This is not unexpected, and as discussed in the text, we consider that value to be a qualitative measurement of the decreased probability of a DFG flip in Abl1 mutants, and not a direct measurement of energy barriers.

      Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      According to previous publications, DFG-Asp is frequently protonated in the DFG-in state of Abl1 kinase. For instance, as quoted from Hanson, Chodera, et al., Cell Chem Bio (2019), “C onsistent with previous simulations on the DFG-Asp-out/in interconversion of Abl kinase we only observe the DFG flip with protonated Asp747 ( Shan et al., 2009 ). We showed previously that the pKa for the DFG-Asp in Abl is elevated at 6.5.”

      Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip. and The study is not very rigorous. The major conclusions do not appear to be supported. The claim that it is the first unbiased simulation to observe DFG flip is not true. For example, Hanson, Chodera et al (Cell Chem Biol 2019), Paul, Roux et al (JCTC 2020), and Tsai, Shen et al (JACS 2019) have also observed the DFG flip.

      We thank the reviewer for pointing out these issues. We have revised the manuscript to better contextualize our claims within the limitations of the method and to acknowledge previous work by Hanson, Chodera et al., Paul, Roux et al., and Tsai, Shen et al.

      The updated excerpt is described below

      “Through our work, we have simulated an ensemble of DFG flip pathways in a wild-type kinase and its variants with atomistic resolution and without the use of biasing forces, also reporting the effects of inhibitor-resistant mutations in the broader context of kinase inactivation likelihood with such level of detail. “

      Reviewer #2:

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      Yes, this is correct. We added a sentence to WEMD summary section of Results and Discussion discussing it.

      “As a supervised enhanced sampling method, WE employs progress coordinates (PCs) to track the time-dependent evolution of a system from one or more basis states towards a target state. Although weighted ensemble simulations are unbiased in the sense that no biasing forces are added over the course of the simulations, the selection of progress coordinates and the bin definitions can potentially bias the results towards specific pathways \cite{Zuckerman2017}. Additionally, traditional WEMD simulations do not explicitly enhance sampling along orthogonal degrees of freedom (those not captured by the progress coordinates). In practice, this means that insufficient PC definitions can lead to poor sampling.”

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      We have remade Figure 3. We removed 3B and accompanying discussion as upon review we were not confident on the significance of the LPATH results where it pertains to the probability of intermediate states. We replaced 3B with a summary of the pathways 1 and 2 in regards to the Phe382 flip (which is the most contrasting difference).

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      As a consequence of dropping the LPATH analysis, we also removed mentions to S1 as it further analysis made it hard to distinguish from DFG-in, For DFG-inter, we mention that conformation because (a) it is shared by both flipping mechanisms that we have found, and (b) it seems relevant for pharmacology, as it has been observed in other kinases such as Aurora B (PDB 2WTV), as Asp381 flipping before Phe382 creates space in the orthosteric kinase pocket which could be potentially targeted by an inhibitor.

      It would be nice to have error bars on the populations reported in Figure 3.

      Agreed, upon review we decided do drop the populations as we were not confident on the significance of the LPATH results where it pertains to the probability of intermediate states.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      Thanks for the correction, we agree with the reviewer and have amended the discussion to reflect this. Since we are starting our simulations in the DFG-in state, the probability of walkers arriving in DFG-out in our steady state WEMD simulations should (assuming proper sampling) represent the probability of the transition. We incorrectly associated the probability of the DFG-out state itself with the probability of the transition.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Previous NMR work has found the population of apo DFG in (PDB 6XR6) in solution to be around 88% for wild-type ABL1, and 6% for DFG out (PDB 6XR7). The remaining 6% represents post-DFG-out state (PDB 6XRG) where the activation loop has folded in near the hinge, which we did not simulate due to the computational cost associated with it. The same study reports the barrier height from DFG-in to DFG-out to be estimated at around 30 kcal/mol.

      (1) Conformational states dynamically populated by a kinase determine its function. Tao Xie et al., Science 370, eabc2754 (2020). DOI:10.1126/science.abc2754

      (we already have that in the text, just need to quote here)

      “Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?”

      Yes, we have amended Figure 3B to be clearer. In previous literature both pathways have been observed [1], although not specifically defined.

      Source: Computational Study of the “DFG-Flip” Conformational Transition in c-Abl and c-Src Tyrosine Kinases. Yilin Meng, Yen-lin Lin, and Benoît Roux The Journal of Physical Chemistry B 2015 119 (4), 1443-1456 DOI: 10.1021/jp511792a

    5. eLife Assessment

      This work uses enhanced sampling molecular dynamics methods to generate potentially useful information about a conformational change (the DFG flip) that plays a key role in regulating kinase function and inhibitor binding. The focus of the work is on the mechanism of conformational change and how mutations affect the transition. The evidence supporting the conclusions is incomplete.

    6. Reviewer #1 (Public review):

      Summary:

      The authors used weighted ensemble enhanced sampling molecular dynamics (MD) to test the hypothesis that a double mutant of Abl favors the DFG-in state relative to the WT and therefore causes the drug resistance to imatinib.

      Strengths:

      The authors employed the state-of-the-art weighted ensemble MD simulations with three novel progress coordinates to explore the conformational changes the DFG motif of Abl kinase. The hypothesis regarding the double mutant's drug resistance is novel.

      Weaknesses:

      The study contains many uncertain aspects. A major revision is needed to strengthen the support for the conclusions.

      (1) Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      (2) Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      (3) The DFG flip needs to be sampled several times to establish free energy difference.

      (4) The free energy plots do not appear to show an intermediate state as claimed.

      (5) The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      (6) The free energy scale (100 kT) appears to be one order of magnitude too large.

      (7) Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      (8) Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip.

    7. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript on the mechanism of the DFG flip in kinases. This conformational change is important for the toggling of kinases between active (DFG-in) and inactive (DFG-out) states. The relative probabilities of these two states are also an important determinant of the affinity of inhibitors for a kinase. However, it is an extremely slow/rare conformational change, making it difficult to capture in simulations. The authors show that weighted ensemble simulations can capture the DFG flip and then delve into the mechanism of this conformational change and the effects of mutations.

      Strengths:

      The DFG flip is very hard to capture in simulations. Showing that this can be done with relatively little simulation by using enhanced sampling is a valuable contribution. The manuscript gives a nice description of the background for non-experts.

      Weaknesses:

      I was disappointed by the anecdotal approach to presenting the results. Molecular processes are stochastic and the authors have expertise in describing such processes. However, they chose to put most statistical analysis in the SI. The main text instead describes the order of events in single "representative" trajectories. The main text makes it sound like these were most selected as they were continuous trajectories from the weighted ensemble simulations. I would much rather hear a description of the highest probability pathway(s) with some quantification of how probable they are. That would give the reader a clear sense of how representative the events described are.

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      It would be nice to have error bars on the populations reported in Figure 3.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?

    1. eLife Assessment

      In this valuable study, the authors present traces of bone modification on ~1.8 million-year-old proboscidean remains from Tanzania, which they infer to be the earliest evidence for stone-tool-assisted megafaunal consumption by hominins. Challenging published claims, the authors argue that persistent megafaunal exploitation roughly coincided with the earliest Achulean tools. Notwithstanding the rich descriptive and spatial data, the behavioral inferences about hominin agency rely on traces (such as bone fracture patterns and spatial overlap) that are not unequivocal; the evidence presented to support the inferences thus remains incomplete. Given the implications of the timing and extent of hominin consumption of nutritious and energy-dense food resources, as well as of bone toolmaking, the findings of this study will be of interest to paleoanthropologists and other evolutionary biologists.

    2. Reviewer #1 (Public review):

      Domínguez-Rodrigo and colleagues make a moderately convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at the site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities. The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.

      The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. However, the methods of the zooarchaeological and taphonomic analysis - the core of the study - are peculiarly missing. Some of these are explained along the manuscript, but not in a standard Methods paragraph with suitable references and an explicit account of how the authors recorded bone-surface modifications and the mode of bone fragmentation. This seems more of a technical omission that can be easily fixed than a true shortcoming of the study. The results are detailed and clearly presented.

      By and large, the authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. Nevertheless, some ambiguity surrounds the evolutionary significance part. The authors emphasize the temporal and spatial correlation of (1) elephant butchery, (2) Acheulian toolkits, and (3) larger sites, but do not actually discuss how these elements may be causally related. Is it not possible that larger group size or the adoption of Acheulian technology have nothing to do with megafaunal exploitation? Alternative hypotheses exist, and at least, the authors should try to defend the causation, not just put forward the correlation. The only exception is briefly mentioning food surplus as a "significant advantage", but how exactly, in the absence of food-preservation technologies? Moreover, in a landscape full of aggressive scavengers, such excess carcass parts may become a death trap for hominins, not an advantage. I do think that demonstrating habitual butchery bears very significant implications for human evolution, but more effort should be invested in explaining how this might have worked.

      Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.

    3. Reviewer #2 (Public review):

      The authors argue that the Emiliano Aguirre Korongo (EAK) assemblage from the base of Bed II at Olduvai Gorge shows systematic exploitation of elephants by hominins about 1.78 million years ago. They describe it as the earliest clear case of proboscidean butchery at Olduvai and link it to a larger behavioral shift from the Oldowan to the Acheulean.

      The paper includes detailed faunal and spatial data. The excavation and mapping methods appear to be careful, and the figures and tables effectively document the assemblage. The data presentation is strong, but the behavioral interpretation is not supported by the evidence.

      The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.

      The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.

      The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.

      The broader evolutionary conclusions are not supported by the data. The paper presents EAK as marking the start of systematic megafaunal exploitation, but the evidence does not show this. The assemblage is described well, but the behavioral and evolutionary interpretations extend far beyond what can be demonstrated.

    1. eLife Assessment

      This study presents a valuable finding on mutations in ZNF217, ZNF703, and ZNF750 through 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The evidence supporting the claims of the authors is solid, yet the analysis of the manuscript lacks methodological transparency, statistical detail, and sufficient comparison with existing large-scale datasets. The work will be of interest to medical biologists and scientists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates mutations and expression patterns of zinc finger proteins in Kenyan breast cancer patients.

      Strengths:

      Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The authors identified mutations in ZNF217, ZNF703, and ZNF750.

      Weaknesses:

      (1) Research scope:

      The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted and why comparisons to previous literature were not included.

      (2) Language and Style Issues:

      Several statements read somewhat 'unnaturally', and I strongly recommend proofreading.

      (3) Methods and Data Analysis Details:

      The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

      (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

      (b) Statistical methods for somatic mutation/SNP detection.

      (c) Details of RNA purification and RNA-seq library preparation.

      Without these details, the reproducibility of the study is limited.

      (4) Data Reporting:

      This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

      (a) deposit sequencing data in a public repository.

      (b) provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

      (c) clarify whether raw or adjusted p-values were used for DEG analysis.

      (d) perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

      (5) Mutation Analysis:

      Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

    3. Reviewer #2 (Public review):

      Summary:

      This work integrated the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer.

      Strengths:

      The mutation landscape of ZNF217, ZNF703, and ZNF750 was comprehensively studied and correlated with tumor stage and HER2 status to highlight the clinical significance.

      Weaknesses:

      The current study design is relatively simple, and there is a limited cohort size, which is relatively small to reach significant findings. Thus, sample size enrichment, along with more analytic work, is needed.

      Targeted exploration of the ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to define the somatic mutational landscape and transcriptomic expression of the ZNF217, ZNF703, and ZNF750 genes in breast cancers from Kenyan women and to investigate associations with clinicopathological features like HER2 status and cancer stage. They employed whole-exome and RNA-sequencing on 23 paired tumor-normal samples to achieve this.

      Strengths:

      (1) A major strength is the focus on a Kenyan cohort, addressing a critical gap in genomic studies of breast cancer, which are predominantly based on European or Asian populations.

      (2) The integration of DNA- and RNA-level data from the same patients provides a comprehensive view, linking genetic alterations to expression changes.

      Weaknesses:

      (1) The small cohort size (n=23) significantly limits the statistical power to detect associations between genetic features and clinical subgroups (e.g., HER2 status, stage), rendering the negative findings inconclusive.

      (2) The study is primarily descriptive. While it effectively catalogs mutations and expression changes, it does not include functional experiments to validate the biological impact of the identified alterations.

    1. eLife Assessment

      Clonal hematopoiesis of indeterminate potential (CHIP) is a known risk factor for coronary artery disease, though its precise role in disease progression continues to emerge. This study leverages valuable single-cell RNA data from patients with CHIP mutations and controls to predict key interactions between endothelial cells and monocytes. Using an AI prediction model, the authors identify druggable targets that mediate immune cell interactions in CHIP and provide solid evidence to support their findings.

    2. Reviewer #1 (Public review):

      Summary:

      Using single-cell RNA sequencing and bioinformatics approaches, the authors aimed to discover if and how cells carrying mutations common to clonal haematopoiesis were more adherent to endothelial cells.

      Strengths:

      (1) The authors used matched blood and adipose tissue samples from the same patients (with the exception of the control people) to conduct their analysis.

      (2) The use of bioinformatics and in-silico approaches helped to fast-track their aims to test specific inhibitors in their model cell adhesion system.

      Weaknesses:

      (1) The analysis was done on pooled cells; it would have been interesting to know if the same adhesion gene signatures were observed across the donors.

      (2) The adhesion assays were conducted under static conditions; shear flow adhesion experiments would have been better. Mixed cultures using cell trackers would have been even better.

      (3) In the intervention studies, the authors should have directly targeted the monocytes (not the endothelial cells) and should have also included DNMT3A mutant/KO cells to show specificity to TET2 CHIP.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe potential mechanisms underlying the changes in endothelial-monocyte interactions in patients with clonal hematopoiesis of indeterminate potential (CHIP), including reduced velocity and increased ligand interactions of CHIP-mutated monocytes. They use a combination of transcriptomics (some for the first time in these tissues in patients with CHIP), in silico analyses, and ex vivo approaches to outline the changes that occur in blood monocytes derived from patients with CHIP. These findings advance the current field, which has previously mostly used mice and/or has been focused on cancer outcomes. The authors identify distinct alterations in signaling downstream of DNTM3A or TET2 mutations, which further distinguish two major mutations that contribute to CHIP.

      Strengths:

      (1) Combinatorial transcriptomics was used to identify potential therapeutic targets, which is an important proof-of-concept for multiple fields.

      (2) The authors identify distinct ligand interactions downstream of TET2 and DNMT3A mutations.

      Weaknesses:

      (1) The authors extrapolate findings in adipose tissue in diabetic patients to vascular disease (ostensibly in the carotid or cardiac arteries), citing the difficulty of using tissue-matched samples. Broad-reaching conclusions need to be backed up in the relevant systems, considering how different endothelial cells in various vascular beds react. Considering these data were obtained with n=3 patients being sufficient to identify these changes, it seems that this can be performed (perhaps in silico) in the correct tissue.

      (2) The selection/exclusion criteria for the diabetes samples are not noted, and therefore, the relevant conclusions cannot be fully evaluated, nor is the source of adipose tissue stated.

      Appraisal:

      While authors describe how to as well as the technical feasibility of integrating a number of transcriptomic techniques, they do not seem to do so to produce highly compelling data or targets within this manuscript. The potential is there to drill down to mechanisms; however, the data gathered herein do not highlight novel targets. For example, CXCL2 and 3 are already shown to be differentially expressed in TET2 loss combined with LDL treatment in the macrophages of mice. Furthermore, these authors then show that in humans, the prototypical CXC chemokine, IL8 (which mice lack), is significantly higher in TET2-mutated patients (DOI: 10.1056/NEJMoa1701719). The authors should demonstrate the utility of their transcriptomics by identifying and testing novel targets and focusing on the proper disease states. This could easily be a deep dive into CHIP in adipose tissue in diabetic patients.

    1. eLife Assessment

      This important study presents a thoughtful design and characterization of chimeric influenza hemagglutinin (HA) head domains combining elements of distinct receptor-binding sites. The results provide convincing evidence that polyclonal cross-group responses to influenza A virus can be elicited by a single immunization. While the mechanistic basis of heterotrimer formation and immunodominance differences remains unclear, the authors provide new insights for protein design, vaccinology, and computational vaccine design.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Castro et al. presents an interesting blueprint for designing influenza immunogens that can induce cross-group influenza-specific antibodies. The authors used a structure-based design to transplant receptor binding site (RBS) residues from H5 and H3 into an H1 scaffold. In addition, they assembled the transplanted structures as heterotrimers. They characterized the constructs structurally and used them to immunize mice to define ELISA binding and neutralizing antibodies (Abs) to different influenza strains.

      Strengths and Weaknesses:

      The authors succeeded in generating the different, correctly folded immunogens. The heterotrimers would benefit from more characterization: it remains unclear whether they are even formed or whether the sample is a mix of homotrimers and whether some combinations are more likely than others. While some of these questions are complex to answer, authors should at least confirm the presence of heterotrimers.

      While all constructs were able to elicit H1-specific Abs, different immunogens displayed differential ability to induce a response to the transplanted epitope. While H3-transplant resulted in H3-specific Abs, this was not the case for H5 or the heterotrimers. The importance of the finding is that authors are able to elicit polyclonal Abs neutralizing group 1 and group 2 influenza viruses with a single immunogen. A more in-depth discussion on why the H3-transplant but not the H5-transplant resulted in those specific Abs could be beneficial.

      Overall, the work is a proof of concept that H1-H3 chimeric proteins can be produced and an important first step towards computational vaccines, inducing Abs to multiple groups.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from Castro et al describes the engineering of influenza hemagglutinin H1-based head domains that display receptor-binding-site residues from H5 and H3 HAs. The initial head-only chimeras were able to bind to FluA20, which recognizes the trimer interface, but did not bind well to H5 or H3-specific antibodies. Furthermore, these constructs were not particularly stable in solution as assessed by low melting temperatures. Crystal structures of each chimeric head in complex with FluA20 were obtained, demonstrating that the constructs could adopt the intended conformation upon stabilization with FluA20. The authors next placed the chimeric heads onto an H1 stalk to create homotrimeric HA ectodomains, as well as a heterotrimeric HA ectodomain. The homotrimeric chimeric HAs were better behaved in solution, and H3- and H5-specific antibodies bound to these trimers with affinities that were only about 10-fold weaker compared to their respective wildtype HAs. The heterotrimeric chimeric HA showed transient stability in solution and could bind more weakly to the H3- and H5-specific antibodies. Mice immunized with these trimers elicited cross-reactive binding antibodies, although the cross-neutralizing titers were less robust. The most positive result was that the H1H3 trimer was able to elicit sera that neutralized both H1 and H3 viruses.

      Strengths:

      The manuscript is very well-written with clear figures. The biophysical and structural characterizations of the antigen were performed to a high standard. The engineering approach is novel, and the results should provide a basis for further iteration and improvement of RBS transplantation.

      Weaknesses:

      The main limitation of the study is that there are no statistical tests performed for the immunogenicity results shown in Figures 4 and 5. It is therefore unknown whether the differences observed are statistically significant. Additionally, fits of the BLI data in Figure 3 to the binding model used to determine the binding constants should be shown.

    1. eLife Assessment

      This fundamental work reveals that the accessibility of the unstructured C-terminal tails of α- and β-tubulins differs with the state of the microtubule lattice. Their accessibility increases with the expansion of the lattice induced by GTP and certain MAPs, which can then dictate the subsequent interactions between MAPs and microtubules, and post-translational modifications of tubulin tails. The evidence supporting the conclusion is compelling, although the characterisation of the probes does not answer whether they directly affect the lattice or expose the C-terminal tails of tubulin. This work will be of great interest to the cytoskeleton field.

    2. Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Weaknesses:

      There is no information on the status of the beta tubulin CTTs. The study is done with mixed isotype microtubules, both in cells and in vitro. It remains unclear whether all the alpha tubulins in a mixed isotype microtubule lattice behave equivalently, or whether the effect is tubulin isotype-dependent. It remains unclear whether local binding of effectors can locally expand the lattice and locally expose the alpha CTTs.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

    3. Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      Weaknesses:

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      Weaknesses:

      There are a number of weaknesses in the paper, many of which can be addressed textually. Some of the supporting evidence is preliminary and would benefit from additional experimental validation and clearer presentation before the conclusions can be considered fully supported.

      In particular, the authors should directly test in vitro whether Taxol addition can induce lattice exchange (see comments below).

    1. eLife Assessment

      This valuable study presents EM structures of new conformational states of the LONP1 AAA+ protease in conjunction with the mitochondrial protein substrates (StAR, TFAM), along with biochemical functional assays. The EM structures revealed new conformational states in a closed configuration. The structures and associated functional results are solid. However, a notable weakness is the absence of substrates found threaded through the ATPase pores.

    2. Reviewer #1 (Public review):

      The remodeling of macromolecular substrates by AAA+ proteins is an essential aspect of life at the molecular scale, and understanding conserved and divergent features of substrate recognition across the AAA+ protein family remains an ongoing area of research. AAA+ proteins are highly modular and typically combine N-terminal recognition domain(s) with ATPase domain(s) to recognize and unfold some macromolecular target, such as dsDNA or protein substrates. This can be coupled to activity by additional C-terminal domains that further modify the substrate, such as a protease domain that hydrolyzes the extended, unstructured protein chain that emerges from the ATPase domain during substrate processing.

      This work focuses on one such AAA+ protease, LONP1. LONP1 is an essential AAA+ protein involved in mitochondrial proteostasis, and disruption of its function in vivo has serious developmental consequences. This work explores the processing of two new mitochondrial protein substrates (StAR, TFAM) by LONP1 and presents new conformational states of LONP1 with closed configurations and no substrate threaded through the ATPase pores. The quality of the reconstructions and models is very good. Critically, one of these states (LONP1C3) has a completely occluded ATPase pore from the N-terminal side of the ATPase ring, where three of the six NTDs/CCDs interact tightly to form a C3-symmetric substructure preventing substrate ingress. The authors note several key interactions between amino acids forming these substructures, and perform ATPase assays on mutant LONP1 proteins to determine hydrolysis rates in the absence or presence of substrate. These patterns are recapitulated in casein disassembly assays as well. Based on these results, the authors note that the mutants have differential effects depending on the "foldedness" of the substrate, and surmise that disruption of the C3-symmetric substructure from the EM experiments is responsible for these effects - an intriguing idea. In addition to the C3 state, the authors observe additional intermediates which they place on the same conformational coordinate. One such structure is the LONP1C2 state with two splits, hinting at a conformational transition from LONP1C3 to the closed/active state.

      Taken together, these results form the basis of an interesting story. However, I feel that more experimentation and analysis are needed to address several key points, or that the conclusions should be toned down. First and foremost, I note that while the hypothesis that the LONP1C3 state is a critical step in recognizing substrate "foldedness" is an interesting one, the claim is made solely on the basis of biochemical experiments with mutant LONP1, and that there is no substrate density associated with LONP1C3. In the absence of substrate density and/or structural data for the mutants, this seems like a very strong claim. More generally, the manuscript invokes the conformational landscape of LONP1C3 in multiple instances, but no such landscape is presented to show how LONP1C3 and the other states are quantitatively linked. Finally, I note the prevalence of ADP-only active sites in these intermediates, and am concerned that this might be related to the depletion of ATP under the on-grid reaction conditions. The inclusion of an ATP regeneration system may be a useful way to ensure that ATP/ADP concentrations are more physiological and that excessive ADP will not bias the conformations of the ring systems.

      In summary, I believe this manuscript is exciting but would benefit from a paring back of claims, or the inclusion of some additional data to fill in some of the conceptual gaps outlined above.

    3. Reviewer #2 (Public review):

      This paper by Mindrebo et al. reveals multiple novel conformations of the human LONP1 protease. AAA+ proteases, like LONP1, are needed for maintaining proteostasis in cells and organelles. While structures of fully active (closed) and fully inactive (open) conformations of LONP1 are now established, the dynamics between these states and how changes in conformations may contribute to or be triggered by substrates and nucleotides are unclear. In this work, the authors characterize a novel C3-symmetric state of LONP1 bound to TFAM (a native substrate), suggesting that this C3-state is an intermediate in the open to closed cycle, and make mutations to test this model biochemically. Deeper inspection of their TFAM-bound LONP1 dataset reveals additional conformations, including a C2-symmetric and two asymmetric intermediates. All these conformations are synthesized by the authors to propose a model for how LONP1 transitions from an inactive OFF state to an active ENZ state. There are clear, interesting structural aspects to this work, revealing alternate conformations to shed light on the dynamics of LONP1. However, some of the conclusions interpret well beyond the scope of the experiments shown, and this is discussed below.

      Overall, there are two major comments with the work as written that, if addressed, would make the results more compelling. First, the order of events and existence of intermediate states is primarily from static structural snapshots and fitting these structures to a possible mechanism. It would be ideal to have some biochemical or kinetic data supporting these steps and the existence of these intermediates. For example, the model is that the C3-state is an ADP-bound intermediate that blocks access and acts as a checkpoint for progression to the ENZ state of LONP1. The major evidence for this comes from a mutation (D449A) that fails to degrade TFAM as well as StAR or casein, which is taken as evidence that failure to form the C3 state reduces the ability to degrade more 'folded' substrates. A prediction of this model would be that destabilizing TFAM through mutation should improve D449A degradation. Ideally, other measures of conformational changes, such as FRET or HDX-MS, could be used to visualize this C3-state in unmutated LONP1 during the process of substrate engagement and degradation. At a minimum, using ATP hydrolysis as a proxy for forming the ENZ state and the assumption that different substrates will differentially promote formation of the C3-state means that measuring ATP hydrolysis of wt LONP1 with different substrates will be informative.

      The second major comment is that the primary evidence for the importance of the C3 state is a mutation (D449A) that, based on the cryoEM structure, is incompatible with this conformation but should not affect any other state. A concern that arises is whether this mutation is doing more than simply destabilizing the C3 state and affecting substrate recognition/enzymatic activity in some other manner. To address this point, the authors could perform cryoEM characterization of the D449A mutant, which should show reduced or no presence of the C3-state, but still an intact ability to form the closed ENZ state.

    4. Reviewer #3 (Public review):

      Summary:

      The AAA+ protease LON1P is a central component of mitochondrial protein quality control and has crucial functions in diverse processes. Cryo-EM structures of LON1P defined inactive and substrate-processing active states. Here, the authors determined multiple new LON1P structural states by cryo-EM in the presence of diverse substrates. The structures are defined as on-pathway intermediates to LON1P activation. A C3-symmetry state is suggested to function as a checkpoint to scan for LON1P substrates and link correct substrate selection to LON1P activation.

      Strengths:

      The determination of multiple structures provides relevant information on substrate-triggered activation of LON1P. The authors support structural data by biochemical analysis of structure-based mutants.

      Weaknesses:

      How substrate selection is achieved remains elusive, also because substrates are not detectable in the diverse structures. It also remains in parts unclear whether mutant phenotypes can be specifically linked to a single structural state (C3). Some mutant phenotypes appear complex and do not seem to be in line with the model proposed.

    1. eLife Assessment

      The manuscript concerns a fundamental and controversial question in Trypanosoma brucei biology and the parasite life cycle, providing further evidence that slender bloodstream forms can indeed infect Tsetse flies. The study is solid in design and execution, and addresses several criticisms made of the authors' earlier work. Nevertheless, some of the main conclusions are only partially supported: one issue is how, precisely, a "slender" bloodstream form is defined, and discrepancies with some results from other laboratories remain unexplained.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides evidence that slender T. brucei can initiate and complete cyclical development in Glossina morsitans without GlcNAc supplementation, in both sexes, and importantly in non-teneral flies, including salivary-gland infections.

      Comparative transcriptomics show early divergence between slender- and stumpy-initiated differentiation (distinct GO enrichments), with convergence by ~72 h, supporting an alternative pathway into the procyclic differentiation program.

      The work addresses key methodological criticisms of earlier studies and supports the hypothesis that slender forms may contribute to transmission at low parasitaemia.

      Strengths:

      (1) Directly tackles prior concerns (no GlcNAc, both sexes, non-teneral flies) with positive infections through to the salivary glands.

      (2) Transcriptomic time course adds some mechanistic depth.

      (3) Clear relevance to the "transmission paradox"; advances an important debate in the field.

      Weaknesses:

      (1) Discrepancy with Ngoune et al. (2025) remains unresolved; no head-to-head control for colony/blood source or microbiome differences that could influence vector competence.

      (2) Lacks in vivo feeding validation (e.g., infecting flies directly on parasitaemic mice) to strengthen ecological relevance.

      (3) Mechanistic inferences are largely correlative (although not requested, there is no functional validation of genes or pathways emerging from the transcriptomics).

      (4) Reliance on a single parasite clone (AnTat 1.1) and one vector species limits external validity.

    3. Reviewer #2 (Public review):

      Summary:

      This paper is an exciting follow-up to two recent publications in eLife: one from the same lab, reporting that slender forms can successfully infect tsetse flies (Schuster, S et al., 2021), and another independent study claiming the opposite (Ngoune, TMJ et al., 2025). Here, the authors address four criticisms raised against their original work: the influence of N-acetyl-glucosamine (NAG), the use of teneral and male flies, and whether slender forms bypass the stumpy stage before becoming procyclic forms.

      Strengths:

      We applaud the authors' efforts in undertaking these experiments and contributing to a better understanding of the T. brucei life cycle. The paper is well-written and the figures are clear.

      Weaknesses:

      We identified several major points that deserve attention.

      (1) What is a slender form? Slender-to-stumpy differentiation is a multi-step process, and most of these steps unfortunately lack molecular markers (Larcombe et al, 2023). In this paper, it is essential that the authors explicitly define slender forms. Which parameters were used? It is implicit that slender forms are replicative and GFP::PAD1-negative. Isn't it possible that some GFP::PAD1-negative cells were already transitioning toward stumpy forms, but not yet expressing the reporter? Transcriptomically, these would be early transitional cells that, upon exposure to "tsetse conditions" (in vitro or in vivo), could differentiate into PCF through an alternative pathway, potentially bypassing the stumpy stage (as suggested in Figure 4). Given the limited knowledge of early molecular signatures of differentiation, we cannot exclude the possibility that the slender forms used here included early differentiating cells. We suggest:

      1.1 Testing the commitment of slender forms (e.g., using the plating assay in Larcombe et al., 2023), assessing cell-cycle profile, and other parameters that define slender forms.

      1.2 In the Discussion, acknowledging the uncertainty of "what is a slender?" and being explicit about the parameters and assumptions.

      1.3 Clarifying in the Materials and Methods how cultures were maintained in the 3-4 days prior to tsetse infections, including daily cell densities. Ideally, provide information on GFP expression, cell cycle, and morphology. While this will not fully resolve the concern, it will allow future reinterpretation of the data when early molecular events are better understood.

      (2) Figure 1: This analysis lacks a positive control to confirm that NAG is working as expected. It would strengthen the paper if the authors showed that NAG improves stumpy infection. Once confirmed, the authors could discuss possible differences in the tsetse immune response to slender vs. stumpy forms to explain the absence of an effect on slender infections.

      (3) Figure 2. To conclude that teneral flies are less infected than non-teneral flies, data from Figures 1 and 2 must be directly comparable. Were these experiments performed simultaneously? Please clarify in the figure legends. Moreover, the non-teneral flies here are still relatively young (6-7 days old), limiting comparisons with Ngoune, TMJ et al. 2025, where flies were 2-3 weeks old.

      (4) Figure 3. The PCA plot (A) appears to suggest the opposite of the authors' interpretation: slender differentiation seems to proceed through a transcriptome closer to stumpy profiles. Plotting DEG numbers (panel C) is informative, but how were paired conditions selected? Besides, plotting of the number of DEGs between consecutive time points within and between parasite types is also necessary. There may also be better computational tools to assess temporal relationships. Finally, how does PAD1 transcript abundance change over time in both populations? It would also be important to depict the upregulation of procyclic-specific genes.

      (5) Could methylcellulose in the medium sensitize parasites to QS-signal, leading to more frequent and/or earlier differentiation, despite low densities? If so, cultures with vs. without methylcellulose might yield different proportions of early-differentiating (yet GFP-negative) parasites. This could explain discrepancies between the Engstler and Rotureau labs despite using the same strain. The field would benefit from reciprocal testing of culture conditions. Alternatively, the authors could compare infectivity and transcriptomes of their slender forms under three conditions: (i) in vitro with methylcellulose, (ii) in vitro without methylcellulose, and (iii) directly from mouse blood.

    1. eLife Assessment

      The authors present a set of wrappers around previously developed software and machine-learning toolkits, and demonstrate their use in identifying endogenous sterols binding to a GPCR. The resulting pipeline is potentially useful for molecular pharmacology researchers due to its accessibility and ease of use. However, the evidence supporting the GPCR-related findings remains incomplete, as the machine-learning model shows indications of overfitting, and no direct ligand-binding assays are provided for validation.

    2. Reviewer #1 (Public review):

      This is a re-review following an author revision. I will go point-by-point in response to my original critiques and the authors' responses. I appreciate the authors taking the time to thoughtfully respond to the reviewer critiques.

      Query 1. Based on the authors' description of their contribution to the algorithm design, it sounds like a hyperparameter search wrapped around existing software tools. I think that the use of their own language to describe these modules is confusing to potential users as well as unintentionally hides the contributions of the original LigBuilder developers. The authors should just explain the protocol plainly using language that refers specifically to the established software tools. Whether they use LigBuilder or something else, at the end of the day the description is a protocol for a specific use of an existing software rather than the creation of a new toolkit.

      Query 2. I see. Correct me if I am mistaken, but it seems as though the authors are proposing using the Authenticator to identify the best distributions of compounds based on an in silico oracle (in this case, Vina score), and train to discriminate them. This is similar to training QSAR models to predict docking scores, such as in the manuscript I shared during the first round of review. In principle, one could perform this in successive rounds to create molecules that are increasingly composed of features that yield higher docking scores. This is an established idea that the authors demonstrate in a narrow context, but it also raises concern that one is just enriching for compounds with e.g., an abundance of hydrogen bond donors and acceptors. Regarding points (4) and (5), it is unclear to me how the authors perform train/test splits on unlabeled data with supervised machine learning approaches in this setting. This seems akin to a Y-scramble sanity check. Finally, regarding the discussion on the use of experimental data or FEP calculations for the determination of HABs and LABs, I appreciate the authors' point; however, the concern here is that in the absence of any true oracle the models will just learn to identify and/or generate compounds that exploit limitations of docking scores. Again, please correct me if I am mistaken. It is unclear to me how this advances previous literature in CADD outside of the specific context of incorporating some ideas into a GPCR-Gprotein framework.

      Query 3. The authors mention that the hyperparameters for the ML models are just the package defaults in the absence of specification by the user. I would be helpful to know specifically what the the hyperparameters were for the benchmarks in this study; however, I think a deeper concern is still that these models are almost certainly far overparameterized given the limited training data used for the models. It is unclear why the authors did not just build a random forest classifier to discriminate their HABs and LABs using ligand- or protein-ligand interaction fingerprints or related ideas.

      Query 4. It is good, and expected, that increasing the fraction of the training set size in a random split validation all the way to 100% would allow the model to perfectly discriminate HABs and LABs. This does not demonstrate that the model has significant enrichment in prospective screening, particularly compared to simpler methods. The concern remains that these models are overparameterized and insufficiently validated. The authors did not perform any scaffold splits or other out-of-distribution analysis.

      Query 5. The authors contend that Gcoupler uniquely enables training models when data is scarce and ultra-large screening libraries are unavailable. Today, it is rather straightforward to dock a minimum of thousands of compounds. Using tools such as QuickVina2-GPU (https://pubs.acs.org/doi/10.1021/acs.jcim.2c01504), it is possible to quite readily dock millions in a day with a single GPU and obtain the AutoDock Vina score. GPU-acclerated Vina has been combined with cavity detection tools likely multiple times, including here (https://arxiv.org/abs/2506.20043). There are multiple cavity detection tools, including the ones the authors use in their protocol.

      Query 6. The authors contend that the simulations are converged, but they elected not to demonstrate stability in the predicting MM/GBSA binding energies with block averaging across the trajectory. This could have been done through the existing trajectories without additional simulation.

    3. Reviewer #1 (Public review):

      This is a re-review following an author revision. I will go point-by-point in response to my original critiques and the authors' responses. I appreciate the authors taking the time to thoughtfully respond to the reviewer critiques.

      Query 1. Based on the authors' description of their contribution to the algorithm design, it sounds like a hyperparameter search wrapped around existing software tools. I think that the use of their own language to describe these modules is confusing to potential users as well as unintentionally hides the contributions of the original LigBuilder developers. The authors should just explain the protocol plainly using language that refers specifically to the established software tools. Whether they use LigBuilder or something else, at the end of the day the description is a protocol for a specific use of an existing software rather than the creation of a new toolkit.

      Query 2. I see. Correct me if I am mistaken, but it seems as though the authors are proposing using the Authenticator to identify the best distributions of compounds based on an in silico oracle (in this case, Vina score), and train to discriminate them. This is similar to training QSAR models to predict docking scores, such as in the manuscript I shared during the first round of review. In principle, one could perform this in successive rounds to create molecules that are increasingly composed of features that yield higher docking scores. This is an established idea that the authors demonstrate in a narrow context, but it also raises concern that one is just enriching for compounds with e.g., an abundance of hydrogen bond donors and acceptors. Regarding points (4) and (5), it is unclear to me how the authors perform train/test splits on unlabeled data with supervised machine learning approaches in this setting. This seems akin to a Y-scramble sanity check. Finally, regarding the discussion on the use of experimental data or FEP calculations for the determination of HABs and LABs, I appreciate the authors' point; however, the concern here is that in the absence of any true oracle the models will just learn to identify and/or generate compounds that exploit limitations of docking scores. Again, please correct me if I am mistaken. It is unclear to me how this advances previous literature in CADD outside of the specific context of incorporating some ideas into a GPCR-Gprotein framework.

      Query 3. The authors mention that the hyperparameters for the ML models are just the package defaults in the absence of specification by the user. I would be helpful to know specifically what the the hyperparameters were for the benchmarks in this study; however, I think a deeper concern is still that these models are almost certainly far overparameterized given the limited training data used for the models. It is unclear why the authors did not just build a random forest classifier to discriminate their HABs and LABs using ligand- or protein-ligand interaction fingerprints or related ideas.

      Query 4. It is good, and expected, that increasing the fraction of the training set size in a random split validation all the way to 100% would allow the model to perfectly discriminate HABs and LABs. This does not demonstrate that the model has significant enrichment in prospective screening, particularly compared to simpler methods. The concern remains that these models are overparameterized and insufficiently validated. The authors did not perform any scaffold splits or other out-of-distribution analysis.

      Query 5. The authors contend that Gcoupler uniquely enables training models when data is scarce and ultra-large screening libraries are unavailable. Today, it is rather straightforward to dock a minimum of thousands of compounds. Using tools such as QuickVina2-GPU (https://pubs.acs.org/doi/10.1021/acs.jcim.2c01504), it is possible to quite readily dock millions in a day with a single GPU and obtain the AutoDock Vina score. GPU-acclerated Vina has been combined with cavity detection tools likely multiple times, including here (https://arxiv.org/abs/2506.20043). There are multiple cavity detection tools, including the ones the authors use in their protocol.

      Query 6. The authors contend that the simulations are converged, but they elected not to demonstrate stability in the predicting MM/GBSA binding energies with block averaging across the trajectory. This could have been done through the existing trajectories without additional simulation.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Query: In this manuscript, the authors introduce Gcoupler, a Python-based computational pipeline designed to identify endogenous intracellular metabolites that function as allosteric modulators at the G protein-coupled receptor (GPCR) - Gα protein interface. Gcoupler is comprised of four modules:

      I. Synthesizer - identifies protein cavities and generates synthetic ligands using LigBuilder3

      II. Authenticator - classifies ligands into high-affinity binders (HABs) and low-affinity binders (LABs) based on AutoDock Vina binding energies

      III. Generator - trains graph neural network (GNN) models (GCM, GCN, AFP, GAT) to predict binding affinity using synthetic ligands

      IV. BioRanker - prioritizes ligands based on statistical and bioactivity data

      The authors apply Gcoupler to study the Ste2p-Gpa1p interface in yeast, identifying sterols such as zymosterol (ZST) and lanosterol (LST) as modulators of GPCR signaling. Our review will focus on the computational aspects of the work. Overall, we found the Gcoupler approach interesting and potentially valuable, but we have several concerns with the methods and validation that need to be addressed prior to publication/dissemination.

      We express our gratitude to Reviewer #1 for their concise summary and commendation of our work. We sincerely apologize for the lack of sufficient detail in summarizing the underlying methods employed in Gcoupler, as well as its subsequent experimental validations using yeast, human cell lines, and primary rat cardiomyocyte-based assays.

      We wish to state that substantial improvements have been made in the revised manuscript, every section has been elaborated upon to enhance clarity. Please refer to the point-by-point response below and the revised manuscript.

      Query: (1) The exact algorithmic advancement of the Synthesizer beyond being some type of application wrapper around LigBuilder is unclear. Is the grow-link approach mentioned in the methods already a component of LigBuilder, or is it custom? If it is custom, what does it do? Is the API for custom optimization routines new with the Synthesizer, or is this a component of LigBuilder? Is the genetic algorithm novel or already an existing software implementation? Is the cavity detection tool a component of LigBuilder or novel in some way? Is the fragment library utilized in the Synthesizer the default fragment library in LigBuilder, or has it been customized? Are there rules that dictate how molecule growth can occur? The scientific contribution of the Synthesizer is unclear. If there has not been any new methodological development, then it may be more appropriate to just refer to this part of the algorithm as an application layer for LigBuilder.

      We appreciate Reviewer #1's constructive suggestion. We wish to emphasize that

      (1) The LigBuilder software comprises various modules designed for distinct functions. The Synthesizer in Gcoupler strategically utilizes two of these modules: "CAVITY" for binding site detection and "BUILD" for de novo ligand design.

      (2) While both modules are integral to LigBuilder, the Synthesizer plays a crucial role in enabling their targeted, automated, and context-aware application for GPCR drug discovery.

      (3) The CAVITY module is a structure-based protein binding site detection program, which the Synthesizer employs for identifying ligand binding sites on the protein surface.

      (4) The Synthesizer also leverages the BUILD module for constructing molecules tailored to the target protein, implementing a fragment-based design strategy using its integrated fragment library.

      (5) The GROW and LINK methods represent two independent approaches encompassed within the aforementioned BUILD module.

      Author response image 1.

      Schematic representation of the key strategy used in the Synthesizer module of Gcoupler.

      Our manuscript details the "grow-link" hybrid approach, which was implemented using a genetic algorithm through the following stages:

      (1) Initial population generation based on a seed structure via the GROW method.

      (2) Selection of "parent" molecules from the current population for inclusion in the mating pool using the LINK method.

      (3) Transfer of "elite" molecules from the current population to the new population.

      (4) Population expansion through structural manipulations (mutation, deletion, and crossover) applied to molecules within the mating pool.

      Please note, the outcome of this process is not fixed, as it is highly dependent on the target cavity topology and the constraint parameters employed for population evaluation. Synthesizer customizes generational cycles and optimization parameters based on cavity-specific constraints, with the objective of either generating a specified number of compounds or comprehensively exploring chemical diversity against a given cavity topology.

      While these components are integral to LigBuilder, Synthesizer's innovation lies

      (1) in its programmatic integration and dynamic adjustment of these modules.

      (2) Synthesizer distinguishes itself not by reinventing these algorithms, but by their automated coordination, fine-tuning, and integration within a cavity-specific framework.

      (3) It dynamically modifies generation parameters according to cavity topology and druggability constraints, a capability not inherently supported by LigBuilder.

      (4) This renders Synthesizer particularly valuable in practical scenarios where manual optimization is either inefficient or impractical.

      In summary, Synthesizer offers researchers a streamlined interface, abstracting the technical complexities of LigBuilder and thereby enabling more accessible and reproducible ligand generation pipelines, especially for individuals with limited experience in structural or cheminformatics tools.

      Query: (2) The use of AutoDock Vina binding energy scores to classify ligands into HABs and LABs is problematic. AutoDock Vina's energy function is primarily tuned for pose prediction and displays highly system-dependent affinity ranking capabilities. Moreover, the HAB/LAB thresholds of -7 kcal/mol or -8 kcal/mol lack justification. Were these arbitrarily selected cutoffs, or was benchmarking performed to identify appropriate cutoffs? It seems like these thresholds should be determined by calibrating the docking scores with experimental binding data (e.g., known binders with measured affinities) or through re-scoring molecules with a rigorous alchemical free energy approach.

      We again express our gratitude to Reviewer #1 for these inquiries. We sincerely apologize for the lack of sufficient detail in the original version of the manuscript. In the revised manuscript, we have ensured the inclusion of a detailed rationale for every threshold utilized to prioritize high-affinity binders. Please refer to the comprehensive explanation below, as well as the revised manuscript, for further details.

      We would like to clarify that:

      (1) The Authenticator module is not solely reliant on absolute binding energy values for classification. Instead, it calculates binding energies for all generated compounds and applies a statistical decision-making layer to define HAB and LAB classes.

      (2) Rather than using fixed thresholds, the module employs distribution-based methods, such as the Empirical Cumulative Distribution Function (ECDF), to assess the overall energy landscape of the compound set. We then applied multiple statistical tests to evaluate the HAB and LAB distributions and determine an optimal, data-specific cutoff that balances class sizes and minimizes overlap.

      (3) This adaptive approach avoids rigid thresholds and instead ensures context-sensitive classification, with safeguards in place to maintain adequate representation of both classes for downstream model training, and in this way, the framework prioritizes robust statistical reasoning over arbitrary energy cutoffs and aims to reduce the risks associated with direct reliance on Vina scores alone.

      (4) To assess the necessity and effectiveness of the Authenticator module, we conducted a benchmarking analysis where we deliberately omitted the HAB and LAB class labels, treating the compound pool as a heterogeneous, unlabeled dataset. We then performed random train-test splits using the Synthesizer-generated compounds and trained independent models.

      (5) The results from this approach demonstrated notably poorer model performance, indicating that arbitrary or unstructured data partitioning does not effectively capture the underlying affinity patterns. These experiments highlight the importance of using the statistical framework within the Authenticator module to establish meaningful, data-driven thresholds for distinguishing High- and Low-Affinity Binders. The cutoff values are thus not arbitrary but emerge from a systematic benchmarking and validation process tailored to each dataset.

      Please note: While calibrating docking scores with experimental binding affinities or using rigorous methods like alchemical free energy calculations can improve precision, these approaches are often computationally intensive and reliant on the availability of high-quality experimental data, a major limitation in many real-world screening scenarios.

      In summary, the primary goal of Gcoupler is to enable fast, scalable, and broadly accessible screening, particularly for cases where experimental data is sparse or unavailable. Incorporating such resource-heavy methods would not only significantly increase computational overhead but also undermine the framework’s intended usability and efficiency for large-scale applications. Instead, our workflow relies on statistically robust, data-driven classification methods that balance speed, generalizability, and practical feasibility.

      Query: (3) Neither the Results nor Methods sections provide information on how the GNNs were trained in this study. Details such as node features, edge attributes, standardization, pooling, activation functions, layers, dropout, etc., should all be described in detail. The training protocol should also be described, including loss functions, independent monitoring and early stopping criteria, learning rate adjustments, etc.

      We again thank Reviewer #1 for this suggestion. We would like to mention that in the revised manuscript, we have added all the requested details. Please refer to the points below for more information.

      (1) The Generator module of Gcoupler is designed as a flexible and automated framework that leverages multiple Graph Neural Network architectures, including Graph Convolutional Model (GCM), Graph Convolutional Network (GCN), Attentive FP, and Graph Attention Network (GAT), to build classification models based on the synthetic ligand datasets produced earlier in the pipeline.

      (2) By default, Generator tests all four models using standard hyperparameters provided by the DeepChem framework (https://deepchem.io/), offering a baseline performance comparison across architectures. This includes pre-defined choices for node features, edge attributes, message-passing layers, pooling strategies, activation functions, and dropout values, ensuring reproducibility and consistency. All models are trained with binary cross-entropy loss and support default settings for early stopping, learning rate, and batch standardization where applicable.

      (3) In addition, Generator supports model refinement through hyperparameter tuning and k-fold cross-validation (default: 3 folds). Users can either customize the hyperparameter grid or rely on Generator’s recommended parameter ranges to optimize model performance. This allows for robust model selection and stability assessment of tuned parameters.

      (4) Finally, the trained models can be used to predict binding probabilities for user-supplied compounds, making it a comprehensive and user-adaptive tool for ligand screening.

      Based on the reviewer #1 suggestion, we have now added a detailed description about the Generator module of Gcoupler, and also provided relevant citations regarding the DeepChem workflow.

      Query: (4) GNN model training seems to occur on at most 500 molecules per training run? This is unclear from the manuscript. That is a very small number of training samples if true. Please clarify. How was upsampling performed? What were the HAB/LAB class distributions? In addition, it seems as though only synthetically generated molecules are used for training, and the task is to discriminate synthetic molecules based on their docking scores. Synthetic ligands generated by LigBuilder may occupy distinct chemical space, making classification trivial, particularly in the setting of a random split k-folds validation approach. In the absence of a leave-class-out validation, it is unclear if the model learns generalizable features or exploits clear chemical differences. Historically, it was inappropriate to evaluate ligand-based QSAR models on synthetic decoys such as the DUD-E sets - synthetic ligands can be much more easily distinguished by heavily parameterized ligand-based machine learning models than by physically constrained single-point docking score functions.

      We thank reviewer #1 for these detailed technical queries. We would like to clarify that:

      (1) The recommended minimum for the training set is 500 molecules, but users can add as many synthesized compounds as needed to thoroughly explore the chemical space related to the target cavity.

      (2) Our systematic evaluation demonstrated that expanding the training set size consistently enhanced model performance, especially when compared to AutoDock docking scores. This observation underscores the framework's scalability and its ability to improve predictive accuracy with more training compounds.

      (3) The Authenticator module initially categorizes all synthesized molecules into HAB and LAB classes. These labeled molecules are then utilized for training the Generator module. To tackle class imbalance, the class with fewer data points undergoes upsampling. This process aims to achieve an approximate 1:1 ratio between the two classes, thereby ensuring balanced learning during GNN model training.

      (4) The Authenticator module's affinity scores are the primary determinant of the HAB/LAB class distribution, with a higher cutoff for HABs ensuring statistically significant class separation. This distribution is also indirectly shaped by the target cavity's topology and druggability, as the Synthesizer tends to produce more potent candidates for cavities with favorable binding characteristics.

      (5) While it's true that synthetic ligands may occupy distinct chemical space, our benchmarking exploration for different sites on the same receptor still showed inter-cavity specificity along with intra-cavity diversity of the synthesized molecules.

      (6) The utility of random k-fold validation shouldn't be dismissed outright; it provides a reasonable estimate of performance under practical settings where class boundaries are often unknown. Nonetheless, we agree that complementary validation strategies like leave-class-out could further strengthen the robustness assessment.

      (7) We agree that using synthetic decoys like those from the DUD-E dataset can introduce bias in ligand-based QSAR model evaluations if not handled carefully. In our workflow, the inclusion of DUD-E compounds is entirely optional and only considered as a fallback, specifically in scenarios where the number of low-affinity binders (LABs) synthesized by the Synthesizer module is insufficient to proceed with model training.

      (8) The primary approach relies on classifying generated compounds based on their derived affinity scores via the Authenticator module. However, in rare cases where this results in a heavily imbalanced dataset, DUD-E compounds are introduced not as part of the core benchmarking, but solely to maintain minimal class balance for initial model training. Even then, care is taken to interpret results with this limitation in mind. Ultimately, our framework is designed to prioritize data-driven generation of both HABs and LABs, minimizing reliance on synthetic decoys wherever possible.

      Author response image 2.

      Scatter plots depicting the segregation of High/Low-Affinity Metabolites (HAM/LAM) (indicated in green and red) identified using Gcoupler workflow with 100% training data. Notably, models trained on lesser training data size (25%, 50%, and 75% of HAB/LAB) severely failed to segregate HAM and LAM (along Y-axis). X-axis represents the binding affinity calculated using IC4-specific docking using AutoDock.

      Based on the reviewer #1’s suggestion, we have now added all these technical details in the revised version of the manuscript.

      Query: (5) Training QSAR models on docking scores to accelerate virtual screening is not in itself novel (see here for a nice recent example: https://www.nature.com/articles/s43588-025-00777-x), but can be highly useful to focus structure-based analysis on the most promising areas of ligand chemical space; however, we are perplexed by the motivation here. If only a few hundred or a few thousand molecules are being sampled, why not just use AutoDock Vina? The models are trained to try to discriminate molecules by AutoDock Vina score rather than experimental affinity, so it seems like we would ideally just run Vina? Perhaps we are misunderstanding the scale of the screening that was done here. Please clarify the manuscript methods to help justify the approach.

      We acknowledge the effectiveness of training QSAR models on docking scores for prioritizing chemical space, as demonstrated by the referenced study (https://www.nature.com/articles/s43588-025-00777-x) on machine-learning-guided docking screen frameworks.

      We would like to mention that:

      (1) While such protocols often rely on extensive pre-docked datasets across numerous protein targets or utilize a highly skewed input distribution, training on as little as 1-10% of ligand-protein complexes and testing on the remainder in iterative cycles.

      (2) While powerful for ultra-large libraries, this approach can introduce bias towards the limited training set and incur significant overhead in data curation, pre-computation, and infrastructure.

      (3) In contrast, Gcoupler prioritizes flexibility and accessibility, especially when experimental data is scarce and large pre-docked libraries are unavailable. Instead of depending on fixed docking scores from external pipelines, Gcoupler integrates target-specific cavity detection, de novo compound generation, and model training into a self-contained, end-to-end framework. Its QSAR models are trained directly on contextually relevant compounds synthesized for a given binding site, employing a statistical classification strategy that avoids arbitrary thresholds or precomputed biases.

      (4) Furthermore, Gcoupler is open-source, lightweight, and user-friendly, making it easily deployable without the need for extensive infrastructure or prior docking expertise. While not a complete replacement for full-scale docking in all use cases, Gcoupler aims to provide a streamlined and interpretable screening framework that supports both focused chemical design and broader chemical space exploration, without the computational burden associated with deep learning docking workflows.

      (5) Practically, even with computational resources, manually running AutoDock Vina on millions of compounds presents challenges such as format conversion, binding site annotation, grid parameter tuning, and execution logistics, all typically requiring advanced structural bioinformatics expertise.

      (6) Gcoupler's Authenticator module, however, streamlines this process. Users only need to input a list of SMILES and a receptor PDB structure, and the module automatically handles compound preparation, cavity mapping, parameter optimization, and high-throughput scoring. This automation reduces time and effort while democratizing access to structure-based screening workflows for users without specialized expertise.

      Ultimately, Gcoupler's motivation is to make large-scale, structure-informed virtual screening both efficient and accessible. The model serves as a surrogate to filter and prioritize compounds before deeper docking or experimental validation, thereby accelerating targeted drug discovery.

      Query: (6) The brevity of the MD simulations raises some concerns that the results may be over-interpreted. RMSD plots do not reliably compare the affinity behavior in this context because of the short timescales coupled with the dramatic topological differences between the ligands being compared; CoQ6 is long and highly flexible compared to ZST and LST. Convergence metrics, such as block averaging and time-dependent MM/GBSA energies, should be included over much longer timescales. For CoQ6, the authors may need to run multiple simulations of several microseconds, identify the longest-lived metastable states of CoQ6, and perform MM/GBSA energies for each state weighted by each state's probability.

      We appreciate Reviewer #1's suggestion regarding simulation length, as it is indeed crucial for interpreting molecular dynamics (MD) outcomes. We would like to mention that:

      (1) Our simulation strategy varied based on the analysis objective, ranging from short (~5 ns) runs for preliminary or receptor-only evaluations to intermediate (~100 ns) and extended (~550 ns) runs for receptor-ligand complex validation and stability assessment.

      (2) Specifically, we conducted three independent 100 ns MD simulations for each receptor-metabolite complex in distinct cavities of interest. This allowed us to assess the reproducibility and persistence of binding interactions. To further support these observations, a longer 550 ns simulation was performed for the IC4 cavity, which reinforced the 100 ns findings by demonstrating sustained interaction stability over extended timescales.

      (3) While we acknowledge that even longer simulations (e.g., in the microsecond range) could provide deeper insights into metastable state transitions, especially for highly flexible molecules like CoQ6, our current design balances computational feasibility with the goal of screening multiple cavities and ligands.

      (4) In our current workflow, MM/GBSA binding free energies were calculated by extracting 1000 representative snapshots from the final 10 ns of each MD trajectory. These configurations were used to compute time-averaged binding energies, incorporating contributions from van der Waals, electrostatic, polar, and non-polar solvation terms. This approach offers a more reliable estimate of ligand binding affinity compared to single-point molecular docking, as it accounts for conformational flexibility and dynamic interactions within the binding cavity.

      (5) Although we did not explicitly perform state-specific MM/GBSA calculations weighted by metastable state probabilities, our use of ensemble-averaged energy estimates from a thermally equilibrated segment of the trajectory captures many of the same benefits. We acknowledge, however, that a more rigorous decomposition based on metastable state analysis could offer finer resolution of binding behavior, particularly for highly flexible ligands like CoQ6, and we consider this a valuable direction for future refinement of the framework.

      Reviewer #2 (Public review):

      Summary:

      Query: Mohanty et al. present a new deep learning method to identify intracellular allosteric modulators of GPCRs. This is an interesting field for e.g. the design of novel small molecule inhibitors of GPCR signalling. A key limitation, as mentioned by the authors, is the limited availability of data. The method presented, Gcoupler, aims to overcome these limitations, as shown by experimental validation of sterols in the inhibition of Ste2p, which has been shown to be relevant molecules in human and rat cardiac hypertrophy models. They have made their code available for download and installation, which can easily be followed to set up software on a local machine.

      Strengths:

      Clear GitHub repository

      Extensive data on yeast systems

      We sincerely thank Reviewer #2 for their thorough review, summary, and appreciation of our work. We highly value their comments and suggestions.

      Weaknesses:

      Query: No assay to directly determine the affinity of the compounds to the protein of interest.

      We thank Reviewer #2 for raising these insightful questions. During the experimental design phase, we carefully accounted for validating the impact of metabolites in the rescue response by pheromone.

      We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry-based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. ransgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the in vitro interaction studies of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Author response image 3.

      (a) Affinity purification of Ste2p from Saccharomyces cerevisiae. Western blot analysis using anti-His antibody showing the distribution of Ste2p in various fractions during the affinity purification process. The fractions include pellet, supernatant, wash buffer, and sequential elution fractions (1–4). Wild-type and ste2Δ strains served as positive and negative controls, respectively. (b) Optimization of Ste2p extraction protocol. Ponceau staining (left) and Western blot analysis using anti-His antibody (right) showing Ste2p extraction efficiency. The conditions tested include lysis buffers containing different concentrations of CHAPS detergent (0.5%, 1%) and glycerol (10%, 20%).

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      We request Reviewer #2 to kindly refer to the assays conducted on the point mutants created in this study, as these experiments offer robust evidence supporting our claims.

      Query: In conclusion, the authors present an interesting new method to identify allosteric inhibitors of GPCRs, which can easily be employed by research labs. Whilst their efforts to characterize the compounds in yeast cells, in order to confirm their findings, it would be beneficial if the authors show their compounds are active in a simple binding assay.

      We express our gratitude and sincere appreciation for the time and effort dedicated by Reviewer #2 in reviewing our manuscript. We are confident that our clarifications address the reviewer's concerns.

      Reviewer #3 (Public review):

      Summary:

      Query: In this paper, the authors introduce the Gcoupler software, an open-source deep learning-based platform for structure-guided discovery of ligands targeting GPCR interfaces. Overall, this manuscript represents a field-advancing contribution at the intersection of AI-based ligand discovery and GPCR signaling regulation.

      Strengths:

      The paper presents a comprehensive and well-structured workflow combining cavity identification, de novo ligand generation, statistical validation, and graph neural network-based classification. Notably, the authors use Gcoupler to identify endogenous intracellular sterols as allosteric modulators of the GPCR-Gα interface in yeast, with experimental validations extending to mammalian systems. The ability to systematically explore intracellular metabolite modulation of GPCR signaling represents a novel and impactful contribution. This study significantly advances the field of GPCR biology and computational ligand discovery.

      We thank and appreciate Reviewer #3 for vesting time and efforts in reviewing our manuscript and for appreciating our efforts.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage the authors to address the points raised during revision to elevate the assessment from "incomplete" to "solid" or ideally "convincing." In particular, we ask the authors to improve the justification for their methodological choices and to provide greater detail and clarity regarding each computational layer of the pipeline.

      We are grateful for the editors' suggestions. We have incorporated significant revisions into the manuscript, providing comprehensive technical details to prevent any misunderstandings. Furthermore, we meticulously explained every aspect of the computational workflow.

      Reviewer #2 (Recommendations for the authors):

      Query: Would it be possible to make the package itself pip installable?

      Yes, it already exists under the testpip repository and we have now migrated it to the main pip. Please access the link from here: https://pypi.org/project/gcoupler/

      Query: I am confused by the binding free energies reported in Supplementary Figure 8. Is the total DG reported that of the protein-ligand complex? If that is the case, the affinities of the ligands would be extremely high. They are also very far off from the reported -7 kcal/mol active/inactive cut-off.

      We thank Reviewer #2 for this query. We would like to mention that we have provided a detailed explanation in the point-by-point response to Reviewer #2's original comment. Briefly, to clarify, the -7 kcal/mol active/inactive cutoff mentioned in the manuscript refers specifically to the docking-based binding free energies (ΔG) calculated using AutoDock or AutoDock Vina, which are used for compound classification or validation against the Gcoupler framework.

      In contrast, the binding free energies reported in Supplementary Figure 8 are obtained through the MM-GBSA method, which provides a more detailed and physics-based estimate of binding affinity by incorporating solvation and enthalpic contributions. It is well-documented in the literature that MM-GBSA tends to systematically underestimate absolute binding free energies when compared to experimental values (10.2174/1568026616666161117112604; Table 1).

      Author response image 4.

      Scatter plot comparing the predicted binding affinity calculated by Docking and MM/GBSA methods, against experimental ΔG (10.1007/s10822-023-00499-0)

      Our use of MM-GBSA is not to match experimental ΔG directly, but rather to assess relative binding preferences among ligands. Despite its limitations in predicting absolute affinities, MM-GBSA is known to perform better than docking for ranking compounds by their binding potential. In this context, an MM-GBSA energy value still reliably indicates stronger predicted binding, even if the numerical values appear extremely higher than typical experimental or docking-derived cutoffs.

      Thus, the two energy values, docking-based and MM-GBSA, serve different purposes in our workflow. Docking scores are used for classification and thresholding, while MM-GBSA energies provide post hoc validation and a higher-resolution comparison of binding strength across compounds.

      To corroborate their findings, can the authors include direct binding affinity assays for yeast and human Ste2p? This will help in establishing whether the observed phenotypic effects are indeed driven by binding of the metabolites.

      We thank Reviewer #2 for raising these insightful questions. During the experimental design phase, we carefully accounted for validating the impact of metabolites in the rescue response by pheromone.

      We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry- based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. Transgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the in vitro interaction studies of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      We request Reviewer #2 to kindly refer to the assays conducted on the point mutants created in this study, as these experiments offer robust evidence supporting our claims.

      Did the authors perform expression assays to make sure the mutant proteins were similarly expressed to wt?

      We thank reviewer #2 for this comment. We would like to mention that:

      (1) In our mutants (S75A, T155D, L289K)-based assays, all mutants were generated using integration at the same chromosomal TRP1 locus under the GAL1 promoter and share the same C-terminal CYC1 terminator sequence used for the reconstituted wild-type (rtWT) construct, thus reducing the likelihood of strain-specific expression differences.

      (2) Furthermore, all strains were grown under identical conditions using the same media, temperature, and shaking parameters. Each construct underwent the same GAL1 induction protocol in YPGR medium for identical durations, ensuring uniform transcriptional activation across all strains and minimizing culture-dependent variability in protein expression.

      (3) Importantly, both the rtWT and two of the mutants (T155D, L289K) retained α-factor-induced cell death (PI and FUN1-based fluorometry and microscopy; Figure 4c-d) and MAPK activation (western blot; Figure 4e), demonstrating that the mutant proteins are expressed at levels sufficient to support signalling.

      Reviewer #3 (Recommendations for the authors):

      My comments that would enhance the impact of this method are:

      (1) While the authors have compared the accuracy and efficiency of Gcoupler to AutoDock Vina, one of the main points of Gcoupler is the neural network module. It would be beneficial to have it evaluated against other available deep learning ligand generative modules, such as the following: 10.1186/s13321-024-00829-w, 10.1039/D1SC04444C.

      Thank you for the observation. To clarify, our benchmarking of Gcoupler’s accuracy and efficiency was performed against AutoDock, not AutoDock Vina. This choice was intentional, as AutoDock is one of the most widely used classical techniques in computer-aided drug design (CADD) for obtaining high-resolution predictions of ligand binding energy, binding poses, and detailed atomic-level interactions with receptor residues. In contrast, AutoDock Vina is primarily optimized for large-scale virtual screening, offering faster results but typically with lower resolution and limited configurational detail.

      Since Gcoupler is designed to balance accuracy with computational efficiency in structure-based screening, AutoDock served as a more appropriate reference point for evaluating its predictions.

      We agree that benchmarking against other deep learning-based ligand generative tools is important for contextualizing Gcoupler’s capabilities. However, it's worth noting that only a few existing methods focus specifically on cavity- or pocket-driven de novo drug design using generative AI, and among them, most are either partially closed-source or limited in functionality.

      While PocketCrafter (10.1186/s13321-024-00829-w) offers a structure-based generative framework, it differs from Gcoupler in several key respects. PocketCrafter requires proprietary preprocessing tools, such as the MOE QuickPrep module, to prepare protein pocket structures, limiting its accessibility and reproducibility. In addition, PocketCrafter’s pipeline stops at the generation of cavity-linked compounds and does not support any further learning from the generated data.

      Similarly, DeepLigBuilder (10.1039/D1SC04444C) provides de novo ligand generation using deep learning, but the source code is not publicly available, preventing direct benchmarking or customization. Like PocketCrafter, it also lacks integrated learning modules, which limits its utility for screening large, user-defined libraries or compounds of interest.

      Additionally, tools like AutoDesigner from Schrödinger, while powerful, are not publicly accessible and hence fall outside the scope of open benchmarking.

      Author response table 1.

      Comparison of de novo drug design tools. SBDD refers to Structure-Based Drug Design, and LBDD refers to Ligand-Based Drug Design.

      In contrast, Gcoupler is a fully open-source, end-to-end platform that integrates both Ligand-Based and Structure-Based Drug Design. It spans from cavity detection and molecule generation to automated model training using GNNs, allowing users to evaluate and prioritize candidate ligands across large chemical spaces without the need for commercial software or advanced coding expertise.

      (2) In Figure 2, the authors mention that IC4 and IC5 potential binding sites are on the direct G protein coupling interface ("This led to the identification of 17 potential surface cavities on Ste2p, with two intracellular regions, IC4 and IC5, accounting for over 95% of the Ste2p-Gpa1p interface (Figure 2a-b, Supplementary Figure 4j-n)..."). Later, however, in Figure 4, when discussing which residues affect the binding of the metabolites the most, the authors didn't perform MD simulations of mutant STE2 and just Gpa1p (without metabolites present). It would be beneficial to compare the binding of G protein with and without metabolites present, as these interface mutations might be affecting the binding of G protein by itself.

      Thank you for this insightful suggestion. While we did not perform in silico MD simulations of the mutant Ste2-Gpa1 complex in the absence of metabolites, we conducted experimental validation to functionally assess the impact of interface mutations. Specifically, we generated site-directed mutants (S75A, L289K, T155D) and expressed them in a ste2Δ background to isolate their effects.

      As shown in the Supplementary Figure, these mutants failed to rescue cells from α-factor-induced programmed cell death (PCD) upon metabolite pre-treatment. This was confirmed through fluorometry-based viability assays, FUN1<sup>TM</sup> staining, and p-Fus3 signaling analysis, which collectively monitor MAPK pathway activation (Figure 4c–e).

      Importantly, the induction of PCD in response to α-factor in these mutants demonstrates that G protein coupling is still functionally intact, indicating that the mutations do not interfere with Gpa1 binding itself. However, the absence of rescue by metabolites strongly suggests that the mutated residues play a direct role in metabolite binding at the Ste2p–Gpa1p interface, thus modulating downstream signaling.

      While further MD simulations could provide structural insight into the isolated mutant receptor–G protein interaction, our experimental data supports the functional relevance of metabolite binding at the identified interface.

      (3) While the experiments, performed by the authors, do support the hypothesis that metabolites regulate GPCR signaling, there are no experiments evaluating direct biophysical measurements (e.g., dissociation constants are measured only in silicon).

      We thank Reviewer #3 for raising these insightful comments. We would like to mention that we performed an array of methods to validate our hypothesis and observed similar rescue effects. These assays include:

      a. Cell viability assay (FDA/PI Flourometry- based)

      b. Cell growth assay

      c. FUN1<sup>TM</sup>-based microscopy assessment

      d. Shmoo formation assays

      e. Mating assays

      f. Site-directed mutagenesis-based loss of function

      g. Transgenic reporter-based assay

      h. MAPK signaling assessment using Western blot.

      i. And via computational techniques.

      Concerning the direct biophysical measurements of Ste2p and metabolites, we made significant efforts to purify Ste2p by incorporating a His tag at the N-terminal, with the goal of performing Microscale Thermophoresis (MST) and Isothermal Titration Calorimetry (ITC) measurements. Despite dedicated attempts over the past year, we were unsuccessful in purifying the protein, primarily due to our limited expertise in protein purification for this specific system. As a result, we opted for genetic-based interventions (e.g., point mutants), which provide a more physiological and comprehensive approach to demonstrating the interaction between Ste2p and the metabolites.

      Furthermore, in addition to the clarification above, we have added the following statement in the discussion section to tone down our claims: “A critical limitation of our study is the absence of direct binding assays to validate the interaction between the metabolites and Ste2p. While our results from genetic interventions, molecular dynamics simulations, and docking studies strongly suggest that the metabolites interact with the Ste2p-Gpa1 interface, these findings remain indirect. Direct binding confirmation through techniques such as surface plasmon resonance, isothermal titration calorimetry, or co-crystallization would provide definitive evidence of this interaction. Addressing this limitation in future work would significantly strengthen our conclusions and provide deeper insights into the precise molecular mechanisms underlying the observed phenotypic effects.”

      (4) The authors do not discuss the effects of the metabolites at their physiological concentrations. Overall, this manuscript represents a field-advancing contribution at the intersection of AI-based ligand discovery and GPCR signaling regulation.

      We thank reviewer #3 for this comment and for recognising the value of our work. Although direct quantification of intracellular free metabolite levels is challenging, several lines of evidence support the physiological relevance of our test concentrations.

      - Genetic validation supports endogenous relevance: Our genetic screen of 53 metabolic knockout mutants showed that deletions in biosynthetic pathways for these metabolites consistently disrupted the α-factor-induced cell death, with the vast majority of strains (94.4%) resisting the α-factor-induced cell death, and notably, a subset even displayed accelerated growth in the presence of α‑factor. This suggests that endogenous levels of these metabolites normally provide some degree of protection, supporting their physiological role in GPCR regulation.

      - Metabolomics confirms in vivo accumulation: Our untargeted metabolomics analysis revealed that α-factor-treated survivors consistently showed enrichment of CoQ6 and zymosterol compared to sensitive cells. This demonstrates that these metabolites naturally accumulate to protective levels during stress responses, validating their biological relevance.

    1. eLife Assessment

      This study provides valuable insights into the evolutionary conservation of sex determination mechanisms in ants by identifying a candidate sex-determining region in a parthenogenetic species. It uses solid, well-executed genomic analyses based on differences in heterozygosity between females and diploid males. While the candidate locus awaits functional validation in this species, the study provides convincing support for the ancient origin of a non-coding locus implicated in sex determination.

    2. Reviewer #1 (Public review):

      The authors have implemented several clarifications in the text and improved the connection between their findings and previous work. As stated in my initial review, I had no major criticisms of the previous version of the manuscript, and I continue to consider this a solid and well-written study. However, the revised manuscript still largely reiterates existing findings and does not offer novel conceptual or experimental advances. It supports previous conclusions suggesting a likely conserved sex determination locus in aculeate hymenopterans, but does so without functional validation (i.e., via experimental manipulation) of the candidate locus in O. biroi. I also wish to clarify that I did not intend to imply that functional assessments in the Pan et al. study were conducted in more than one focal species; my previous review explicitly states that the locus's functional role was validated in the Argentine ant.

    3. Reviewer #3 (Public review):

      The authors have made considerable efforts to conduct functional analyses to the fullest extent possible in this study; however, it is understandable that meaningful results have not yet been obtained. In the revised version, they have appropriately framed their claims within the limits of the current data and have adjusted their statements as needed in response to the reviewers' comments.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This study investigates the sex determination mechanism in the clonal ant Ooceraea biroi, focusing on a candidate complementary sex determination (CSD) locus-one of the key mechanisms supporting haplodiploid sex determination in hymenopteran insects. Using whole genome sequencing, the authors analyze diploid females and the rarely occurring diploid males of O. biroi, identifying a 46 kb candidate region that is consistently heterozygous in females and predominantly homozygous in diploid males. This region shows elevated genetic diversity, as expected under balancing selection. The study also reports the presence of an lncRNA near this heterozygous region, which, though only distantly related in sequence, resembles the ANTSR lncRNA involved in female development in the Argentine ant, Linepithema humile (Pan et al. 2024). Together, these findings suggest a potentially conserved sex determination mechanism across ant species. However, while the analyses are well conducted and the paper is clearly written, the insights are largely incremental. The central conclusion - that the sex determination locus is conserved in ants - was already proposed and experimentally supported by Pan et al. (2024), who included O. biroi among the studied species and validated the locus's functional role in the Argentine ant. The present study thus largely reiterates existing findings without providing novel conceptual or experimental advances.

      Although it is true that Pan et al., 2024 demonstrated (in Figure 4 of their paper) that the synteny of the region flanking ANTSR is conserved across aculeate Hymenoptera (including O. biroi), Reviewer 1’s claim that that paper provides experimental support for the hypothesis that the sex determination locus is conserved in ants is inaccurate. Pan et al., 2024 only performed experimental work in a single ant species (Linepithema humile) and merely compared reference genomes of multiple species to show synteny of the region, rather than functionally mapping or characterizing these regions.

      Other comments:

      The mapping is based on a very small sample size: 19 females and 16 diploid males, and these all derive from a single clonal line. This implies a rather high probability for false-positive inference. In combination with the fact that only 11 out of the 16 genotyped males are actually homozygous at the candidate locus, I think a more careful interpretation regarding the role of the mapped region in sex determination would be appropriate. The main argument supporting the role of the candidate region in sex determination is based on the putative homology with the lncRNA involved in sex determination in the Argentine ant, but this argument was made in a previous study (as mentioned above).

      Our main argument supporting the role of the candidate region in sex determination is not based on putative homology with the lncRNA in L. humile. Instead, our main argument comes from our genetic mapping (in Fig. 2), and the elevated nucleotide diversity within the identified region (Fig. 4). Additionally, we highlight that multiple genes within our mapped region are homologous to those in mapped sex determining regions in both L. humile and Vollenhovia emeryi, possibly including the lncRNA.

      In response to the Reviewer’s assertion that the mapping is based on a small sample size from a single clonal line, we want to highlight that we used all diploid males available to us. Although the primary shortcoming of a small sample size is to increase the probability of a false negative, small sample sizes can also produce false positives. We used two approaches to explore the statistical robustness of our conclusions. First, we generated a null distribution by randomly shuffling sex labels within colonies and calculating the probability of observing our CSD index values by chance (shown in Fig. 2). Second, we directly tested the association between homozygosity and sex using Fisher’s Exact Test (shown in Supplementary Fig. S2). In both cases, the association of the candidate locus with sex was statistically significant after multiple-testing correction using the Benjamini-Hochberg False Discovery Rate. These approaches are clearly described in the “CSD Index Mapping” section of the Methods.

      We also note that, because complementary sex determination loci are expected to evolve under balancing selection, our finding that the mapped region exhibits a peak of nucleotide diversity lends orthogonal support to the notion that the mapped locus is indeed a complementary sex determination locus.

      The fourth paragraph of the results and the sixth paragraph of the discussion are devoted to explaining the possible reasons why only 11/16 genotyped males are homozygous in the mapped region. The revised manuscript will include an additional sentence (in what will be lines 384-388) in this paragraph that includes the possible explanation that this locus is, in fact, a false positive, while also emphasizing that we find this possibility to be unlikely given our multiple lines of evidence.

      In response to Reviewer 1’s suggestion that we carefully interpret the role of the mapped region in sex determination, we highlight our careful wording choices, nearly always referring to the mapped locus as a “candidate sex determination locus” in the title and throughout the manuscript. For consistency, the revised manuscript version will change the second results subheading from “The O. biroi CSD locus is homologous to another ant sex determination locus but not to honeybee csd” to “O. biroi’s candidate CSD locus is homologous to another ant sex determination locus but not to honeybee csd,” and will add the word “candidate” in what will be line 320 at the beginning of the Discussion, and will change “putative” to “candidate” in what will be line 426 at the end of the Discussion.

      In the abstract, it is stated that CSD loci have been mapped in honeybees and two ant species, but we know little about their evolutionary history. But CSD candidate loci were also mapped in a wasp with multi-locus CSD (study cited in the introduction). This wasp is also parthenogenetic via central fusion automixis and produces diploid males. This is a very similar situation to the present study and should be referenced and discussed accordingly, particularly since the authors make the interesting suggestion that their ant also has multi-locus CSD and neither the wasp nor the ant has tra homologs in the CSD candidate regions. Also, is there any homology to the CSD candidate regions in the wasp species and the studied ant?

      In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of diploid males being produced via losses of heterozygosity during asexual reproduction, the revised manuscript will include (in what will be lines 123-126) the highlighted portion of the following sentence: “Therefore, if O. biroi uses CSD, diploid males might result from losses of heterozygosity at sex determination loci (Fig. 1C), similar to what is thought to occur in other asexual Hymenoptera that produce diploid males (Rabeling and Kronauer 2012; Matthey-Doret et al. 2019).”

      We note, however, that in their 2019 study, Matthey-Doret et al. did not directly test the hypothesis that diploid males result from losses of heterozygosity at CSD loci during asexual reproduction, because the diploid males they used for their mapping study came from inbred crosses in a sexual population of that species.

      We address this further below, but we want to emphasize that we do not intend to argue that O. biroi has multiple CSD loci. Instead, we suggest that additional, undetected CSD loci is one possible explanation for the absence of diploid males from any clonal line other than clonal line A. In response to Reviewer 1’s suggestion that we reference the (Matthey-Doret et al. 2019) study in the context of multilocus CSD, the revised manuscript version will include the following additional sentence in the fifth paragraph of the discussion (in what will be lines 372-374): “Multi-locus CSD has been suggested to limit the extent of diploid male production in asexual species under some circumstances (Vorburger 2013; Matthey-Doret et al. 2019).”

      Regarding Reviewer 2’s question about homology between the putative CSD loci from the (Matthey-Doret et al. 2019) study and O. biroi, we note that there is no homology. The revised manuscript version will have an additional Supplementary Table (which will be the new Supplementary Table S3) that will report the results of this homology search. The revised manuscript will also include the following additional sentence in the Results, in what will be lines 172-174: “We found no homology between the genes within the O. biroi CSD index peak and any of the genes within the putative L. fabarum CSD loci (Supplementary Table S3).”

      The authors used different clonal lines of O. biroi to investigate whether heterozygosity at the mapped CSD locus is required for female development in all clonal lines of O. biroi (L187-196). However, given the described parthenogenesis mechanism in this species conserves heterozygosity, additional females that are heterozygous are not very informative here. Indeed, one would need diploid males in these other clonal lines as well (but such males have not yet been found) to make any inference regarding this locus in other lines.

      We agree that a full mapping study including diploid males from all clonal lines would be preferable, but as stated earlier in that same paragraph, we have only found diploid males from clonal line A. We stand behind our modest claim that “Females from all six clonal lines were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.” In the revised manuscript version, this sentence (in what will be lines 199-201) will be changed slightly in response to a reviewer comment below: “All females from all six clonal lines (including 26 diploid females from clonal line B) were heterozygous at the CSD index peak, consistent with its putative role as a CSD locus in all O. biroi.”

      Reviewer #2 (Public review):

      The manuscript by Lacy et al. is well written, with a clear and compelling introduction that effectively conveys the significance of the study. The methods are appropriate and well-executed, and the results, both in the main text and supplementary materials, are presented in a clear and detailed manner. The authors interpret their findings with appropriate caution.

      This work makes a valuable contribution to our understanding of the evolution of complementary sex determination (CSD) in ants. In particular, it provides important evidence for the ancient origin of a non-coding locus implicated in sex determination, and shows that, remarkably, this sex locus is conserved even in an ant species with a non-canonical reproductive system that typically does not produce males. I found this to be an excellent and well-rounded study, carefully analyzed and well contextualized.

      That said, I do have a few minor comments, primarily concerning the discussion of the potential 'ghost' CSD locus. While the authors acknowledge (line 367) that they currently have no data to distinguish among the alternative hypotheses, I found the evidence for an additional CSD locus presented in the results (lines 261-302) somewhat limited and at times a bit difficult to follow. I wonder whether further clarification or supporting evidence could already be extracted from the existing data. Specifically:

      We agree with Reviewer 2 that the evidence for a second CSD locus is limited. In fact, we do not intend to advocate for there being a second locus, but we suggest that a second CSD locus is one possible explanation for the absence of diploid males outside of clonal line A. In our initial version, we intentionally conveyed this ambiguity by titling this section “O. biroi may have one or multiple sex determination loci.” However, we now see that this leads to undue emphasis on the possibility of a second locus. In the revised manuscript, we will split this into two separate sections: “Diploid male production differs across O. biroi clonal lines” and “O. biroi lacks a tra-containing CSD locus.”

      (1) Line 268: I doubt the relevance of comparing the proportion of diploid males among all males between lines A and B to infer the presence of additional CSD loci. Since the mechanisms producing these two types of males differ, it might be more appropriate to compare the proportion of diploid males among all diploid offspring. This ratio has been used in previous studies on CSD in Hymenoptera to estimate the number of sex loci (see, for example, Cook 1993, de Boer et al. 2008, 2012, Ma et al. 2013, and Chen et al., 2021). The exact method might not be applicable to clonal raider ants, but I think comparing the percentage of diploid males among the total number of (diploid) offspring produced between the two lineages might be a better argument for a difference in CSD loci number.

      We want to re-emphasize here that we do not wish to advocate for there being two CSD loci in O. biroi. Rather, we want to explain that this is one possible explanation for the apparent absence of diploid males outside of clonal line A. We hope that the modifications to the manuscript described in the previous response help to clarify this.

      Reviewer 2 is correct that comparing the number of diploid males to diploid females does not apply to clonal raider ants. This is because males are vanishingly rare among the vast numbers of females produced. We do not count how many females are produced in laboratory stock colonies, and males are sampled opportunistically. Therefore, we cannot report exact numbers. However, we will add the highlighted portion of the following sentence (in what will be lines 268-270) to the revised manuscript: “Despite the fact that we maintain more colonies of clonal line B than of clonal line A in the lab, all the diploid males we detected came from clonal line A.”

      (2) If line B indeed carries an additional CSD locus, one would expect that some females could be homozygous at the ANTSR locus but still viable, being heterozygous only at the other locus. Do the authors detect any females in line B that are homozygous at the ANTSR locus? If so, this would support the existence of an additional, functionally independent CSD locus.

      We thank the reviewer for this suggestion, and again we emphasize that we do not want to argue in favor of multiple CSD loci. We just want to introduce it as one possible explanation for the absence of diploid males outside of clonal line A.

      The 26 sequenced diploid females from clonal line B are all heterozygous at the mapped locus, and the revised manuscript will clarify this in what will be lines 199-201. Previously, only six of those diploid females were included in Supplementary Table S2, and that will be modified accordingly.

      (3) Line 281: The description of the two tra-containing CSD loci as "conserved" between Vollenhovia and the honey bee may be misleading. It suggests shared ancestry, whereas the honey bee csd gene is known to have arisen via a relatively recent gene duplication from fem/tra (10.1038/nature07052). It would be more accurate to refer to this similarity as a case of convergent evolution rather than conservation.

      In the sentence that Reviewer 2 refers to, we are representing the assertion made in the (Miyakawa and Mikheyev 2015) paper in which, regarding their mapping of a candidate CSD locus that contains two linked tra homologs, they write in the abstract: “these data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years.” In that same paper, Miyakawa and Mikheyev write in the discussion section: “As ants and bees diverged more than 100 million years ago, sex determination in honey bees and V. emeryi is probably homologous and has been conserved for at least this long.”

      As noted by Reviewer 2, this appears to conflict with a previously advanced hypothesis: that because fem and csd were found in Apis mellifera, Apis cerana, and Apis dorsata, but only fem was found in Mellipona compressipes, Bombus terrestris, and Nasonia vitripennis, that the csd gene evolved after the honeybee (Apis) lineage diverged from other bees (Hasselmann et al. 2008). However, it remains possible that the csd gene evolved after ants and bees diverged from N. vitripennis, but before the divergence of ants and bees, and then was subsequently lost in B. terrestris and M. compressipes. This view was previously put forward based on bioinformatic identification of putative orthologs of csd and fem in bumblebees and in ants [(Schmieder et al. 2012), see also (Privman et al. 2013)]. However, subsequent work disagreed and argued that the duplications of tra found in ants and in bumblebees represented convergent evolution rather than homology (Koch et al. 2014). Distinguishing between these possibilities will be aided by additional sex determination locus mapping studies and functional dissection of the underlying molecular mechanisms in diverse Aculeata.

      Distinguishing between these competing hypotheses is beyond the scope of our paper, but the revised manuscript will include additional text to incorporate some of this nuance. We will include these modified lines below (in what will be lines 287-295), with the additions highlighted:

      “A second QTL region identified in V. emeryi (V.emeryiCsdQTL1) contains two closely linked tra homologs, similar to the closely linked honeybee tra homologs, csd and fem (Miyakawa and Mikheyev 2015). This, along with the discovery of duplicated tra homologs that undergo concerted evolution in bumblebees and ants (Schmieder et al. 2012; Privman et al. 2013) has led to the hypothesis that the function of tra homologs as CSD loci is conserved with the csd-containing region of honeybees (Schmieder et al. 2012; Miyakawa and Mikheyev 2015). However, other work has suggested that tra duplications occurred independently in honeybees, bumblebees, and ants (Hasselmann et al. 2008; Koch et al. 2014), and it remains to be demonstrated that either of these tra homologs acts as a primary CSD signal in V. emeryi.”

      (4) Finally, since the authors successfully identified multiple alleles of the first CSD locus using previously sequenced haploid males, I wonder whether they also observed comparable allelic diversity at the candidate second CSD locus. This would provide useful supporting evidence for its functional relevance.

      As is already addressed in the final paragraph of the results and in Supplementary Fig. S4, there is no peak of nucleotide diversity in any of the regions homologous to V.emeryiQTL1, which is the tra-containing candidate sex determination locus (Miyakawa and Mikheyev 2015). In the revised manuscript, the relevant lines will be 307-310. We want to restate that we do not propose that there is a second candidate CSD locus in O. biroi, but we simply raise the possibility that multi-locus CSD *might* explain the absence of diploid males from clonal lines other than clonal line A (as one of several alternative possibilities).

      Overall, these are relatively minor points in the context of a strong manuscript, but I believe addressing them would improve the clarity and robustness of the authors' conclusions.

      Reviewer #3 (Public review):

      Summary:

      The sex determination mechanism governed by the complementary sex determination (CSD) locus is one of the mechanisms that support the haplodiploid sex determination system evolved in hymenopteran insects. While many ant species are believed to possess a CSD locus, it has only been specifically identified in two species. The authors analyzed diploid females and the rarely occurring diploid males of the clonal ant Ooceraea biroi and identified a 46 kb CSD candidate region that is consistently heterozygous in females and predominantly homozygous in males. This region was found to be homologous to the CSD locus reported in distantly related ants. In the Argentine ant, Linepithema humile, the CSD locus overlaps with an lncRNA (ANTSR) that is essential for female development and is associated with the heterozygous region (Pan et al. 2024). Similarly, an lncRNA is encoded near the heterozygous region within the CSD candidate region of O. biroi. Although this lncRNA shares low sequence similarity with ANTSR, its potential functional involvement in sex determination is suggested. Based on these findings, the authors propose that the heterozygous region and the adjacent lncRNA in O. biroi may trigger female development via a mechanism similar to that of L. humile. They further suggest that the molecular mechanisms of sex determination involving the CSD locus in ants have been highly conserved for approximately 112 million years. This study is one of the few to identify a CSD candidate region in ants and is particularly noteworthy as the first to do so in a parthenogenetic species.

      Strengths:

      (1) The CSD candidate region was found to be homologous to the CSD locus reported in distantly related ant species, enhancing the significance of the findings.

      (2) Identifying the CSD candidate region in a parthenogenetic species like O. biroi is a notable achievement and adds novelty to the research.

      Weaknesses

      (1) Functional validation of the lncRNA's role is lacking, and further investigation through knockout or knockdown experiments is necessary to confirm its involvement in sex determination.

      See response below.

      (2) The claim that the lncRNA is essential for female development appears to reiterate findings already proposed by Pan et al. (2024), which may reduce the novelty of the study.

      We do not claim that the lncRNA is essential for female development in O. biroi, but simply mention the possibility that, as in L. humile, it is somehow involved in sex determination. We do not have any functional evidence for this, so this is purely based on its genomic position immediately adjacent to our mapped candidate region. We agree with the reviewer that the study by Pan et al. (2024) decreases the novelty of our findings. Another way of looking at this is that our study supports and bolsters previous findings by partially replicating the results in a different species.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      L307-308 should state homozygous for either allele in THE MAJORITY of diploid males.

      This will be fixed in the revised manuscript, in what will be line 321.

      Reviewer #3 (Recommendations for the authors):

      The association between heterozygosity in the CSD candidate region and female development in O. biroi, along with the high sequence homology of this region to CSD loci identified in two distantly related ant species, is not sufficient to fully address the evolution of the CSD locus and the mechanisms of sex determination.

      Given that functional genetic tools, such as genome editing, have already been established in O. biroi, I strongly recommend that the authors investigate the role of the lncRNA through knockout or knockdown experiments and assess its impact on the sex-specific splicing pattern of the downstream tra gene.

      Although knockout experiments of the lncRNA would be illuminating, the primary signal of complementary sex determination is heterozygosity. As is clearly stated in our manuscript and that of (Pan et al. 2024), it does not appear to be heterozygosity within the lncRNA that induces female development, but rather heterozygosity in non-transcribed regions linked to the lncRNA. Therefore, future mechanistic studies of sex determination in O. biroi, L. humile, and other ants should explore how homozygosity or heterozygosity of this region impacts the sex determination cascade, rather than focusing (exclusively) on the lncRNA.

      With this in mind, we developed three sets of guide RNAs that cut only one allele within the mapped CSD locus, with the goal of producing deletions within the highly variable region within the mapped locus. This would lead to functional hemizygosity or homozygosity within this region, depending on how the cuts were repaired. We also developed several sets of PCR primers to assess the heterozygosity of the resultant animals. After injecting 1,162 eggs over several weeks and genotyping the hundreds of resultant animals with PCR, we confirmed that we could induce hemizygosity or homozygosity within this region, at least in ~1/20 of the injected embryos. Although it is possible to assess the sex-specificity of the splice isoform of tra as a proxy for sex determination phenotypes (as done by (Pan et al. 2024)), the ideal experiment would assess male phenotypic development at the pupal stage. Therefore, over several more weeks, we injected hundreds more eggs with these reagents and reared the injected embryos to the pupal stage. However, substantial mortality was observed, with only 12 injected eggs developing to the pupal stage. All of these were female, and none of them had been successfully mutated.

      In conclusion, we agree with the reviewer that functional experiments would be useful, and we made extensive attempts to conduct such experiments. However, these experiments turned out to be extremely challenging with the currently available protocols. Ultimately, we therefore decided to abandon these attempts.  

      We opted not to include these experiments in the paper itself because we cannot meaningfully interpret their results. However, we are pleased that, in this response letter, we can include a brief description for readers interested in attempting similar experiments.

      Since O. biroi reproduces parthenogenetically and most offspring develop into females, observing a shift from female- to male-specific splicing of tra upon early embryonic knockout of the lncRNA would provide much stronger evidence that this lncRNA is essential for female development. Without such functional validation, the authors' claim (lines 36-38) seems to reiterate findings already proposed by Pan et al. (2024) and, as such, lacks sufficient novelty.

      We have responded to the issue of “lack of novelty” above. But again, the actual CSD locus in both O. biroi and L. humile appears to be distinct from (but genetically linked to) the lncRNA, and we have no experimental evidence that the putative lncRNA in O. biroi is involved in sex determination at all. Because of this, and given the experimental challenges described above, we do not currently intend to pursue functional studies of the lncRNA.

      References

      Hasselmann M, Gempe T, Schiøtt M, Nunes-Silva CG, Otte M, Beye M. 2008. Evidence for the evolutionary nascence of a novel sex determination pathway in honeybees. Nature 454:519–522.

      Koch V, Nissen I, Schmitt BD, Beye M. 2014. Independent Evolutionary Origin of fem Paralogous Genes and Complementary Sex Determination in Hymenopteran Insects. PLOS ONE 9:e91883.

      Matthey-Doret C, van der Kooi CJ, Jeffries DL, Bast J, Dennis AB, Vorburger C, Schwander T. 2019. Mapping of multiple complementary sex determination loci in a parasitoid wasp. Genome Biology and Evolution 11:2954–2962.

      Miyakawa MO, Mikheyev AS. 2015. QTL mapping of sex determination loci supports an ancient pathway in ants and honey bees. PLOS Genetics 11:e1005656.

      Pan Q, Darras H, Keller L. 2024. LncRNA gene ANTSR coordinates complementary sex determination in the Argentine ant. Science Advances 10:eadp1532.

      Privman E, Wurm Y, Keller L. 2013. Duplication and concerted evolution in a master sex determiner under balancing selection. Proceedings of the Royal Society B: Biological Sciences 280:20122968.

      Rabeling C, Kronauer DJC. 2012. Thelytokous parthenogenesis in eusocial Hymenoptera. Annual Review of Entomology 58:273–292.

      Schmieder S, Colinet D, Poirié M. 2012. Tracing back the nascence of a new sex-determination pathway to the ancestor of bees and ants. Nature Communications 3:1–7.

      Vorburger C. 2013. Thelytoky and Sex Determination in the Hymenoptera: Mutual Constraints. Sexual Development 8:50–58.

    1. eLife Assessment

      The manuscript presents important findings that advance our understanding of how microglia adapt their surveillance strategies during chronic neurodegeneration. The evidence presented is convincing, with appropriate and validated methodology broadly supporting the claims given by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Subhramanian et al. carefully examined how microglia adapt their surveillance strategies during chronic neurodegeneration, specifically in prion-infected mice. The authors used ex vivo time-lapse imaging and in vitro strategies, finding that reactive microglia exhibit a highly mobile, "kiss-and-ride" behavior, which contrasts with the more static surveillance typically observed in homeostatic microglia. The manuscript provides fundamental mechanistic insights into the dynamics of microglia-neuron interactions, implicates P2Y6 signaling in regulating mobility, and suggests that intrinsic reprogramming of microglia might underlie this behavior. The conclusions are therefore compelling.

      Strengths:

      (1) The novelty of the study is high, in particular, the demonstration that microglia lose territorial confinement and dynamically migrate from neuron to neuron under chronic neurodegeneration.

      (2) The possible implications of a stimulus-independent high mobility in reactive microglia are particularly striking. Although this is not fully explored (see comments below).

      (3) The use of time-lapse imaging in organotypic slices rather than overexpression models provided a more physiological approach.

      (4) Microglia-neuron interactions in neurodegeneration have broad implications for understanding the progression of other diseases that are associated with chronic inflammation, such as Alzheimer's and Parkinson's.

      Weaknesses:

      (1) The Cx3cr1/EGFP line labels all myeloid cells, which makes it difficult to conclude that all observed behaviors are attributable to microglia rather than infiltrating macrophages. The authors refer to this and include it as a limitation. Nonetheless, complementary confirmation by additional microglia markers would strengthen their claims.

      (2) Although the authors elegantly describe dynamic surveillance and envelopment hypothesis, it is unclear what the role of this phenotype is for disease progression, i.e., functional consequences. For example, are the neurons that undergo sustained envelopment more likely to degenerate?

      (3) Moreover, although the increase in mobility is a relevant finding, it would be interesting for the authors to further comment on what the molecular trigger(s) is/are that might promote this increase. These adaptations, which are at least long-lasting, confer apparent mobility in the absence of external stimuli.

      (4) The authors performed, as far as I could understand, the experiments in cortical brain regions. There is no clear rationale for this in the manuscript, nor is it clear whether the mobility is specific to a particular brain region. This is particularly important, as microglia reactivity varies greatly depending on the brain region.

      (5) It would be relevant information to have an analysis of the percentage of cells in normal, sub-clinical, early clinical, and advanced stages that became mobile. Without this information, the speed/distance alone can have different interpretations.

    3. Reviewer #2 (Public review):

      This is a nice paper focused on the response of microglia to different clinical stages of prion infection in acute brain slices. The key here is the use of time-lapse imaging, which captures the dynamics of microglial surveillance, including morphology, migration, and intracellular neuron-microglial contacts. The authors use a myeloid GFP-labeled transgenic mouse to track microglia in SSLOW-infected brain slices, quantifying differences in motility and microglial-neuron interactions via live fluorescence imaging. Interesting findings include the elaborate patterns of motility among microglia, the distinct types and duration of intracellular contacts, the potential role of calcium signaling in facilitating hypermobility, and the fact that this motion-promoting status is intrinsic to microglia, persisting even after the cells have been isolated from infected brains. Although largely a descriptive paper, there are mechanistic insights, including the role of calcium in supporting movement of microglia, where bursts of signaling are identified even within the time-lapse format, and inhibition studies that implicate the purinergic receptor and calcium transient regulator P2Y6 in migratory capacity.

      Strengths:

      (1) The focus on microglia activation and activity in the context of prion disease is interesting.

      (2) Two different prions produce largely the same response.

      (3) Use of time-lapse provides insight into the dynamics of microglia, distinguishing between types of contact - mobility vs motility - and providing insight into the duration/transience and reversibility of extensive somatic contacts that include brief and focused connections in addition to soma envelopment.

      (4) Imaging window selection (3 hours) guided by prior publications documenting preserved morphology, activity, and gene expression regulation up to 4 hours.

      (5) The distinction between high mobility and low mobility microglia is interesting, especially given that hyper mobility seems to be an innate property of the cells.

      (6) The live-imaging approach is validated by fixed tissue confocal imaging.

      (7) The variance in duration of neuron/microglia contacts is interesting, although there is no insight into what might dictate which status of interaction predominates.

      (8) The reversibility of the enveloping action, that is not apparently a commitment to engulfment, is interesting, as is the fact that only neurons are selected for this activity.

      (9) The calcium studies use the fluorescent dye calbryte-590 to pick up neuronal and microglial bursts - prolonged bursts are detected in enveloped neurons and in the hyper-mobile microglia - the microglial lead is followed up using MRS-2578 P2Y6 inhibitor that blunts the mobility of the microglia.

      Weaknesses:

      (1) The number of individual cells tracked has been provided, but not the number of individual mice. The sex of the mice is not provided.

      (2) The statistical approach is not clear; was each cell treated as a single observation?

      (3) The potential for heterogeneity among animals has not been addressed.

      (4) Validation of prion accumulation at each clinical stage of the disease is not provided.

      (5) How were the numerous captures of cells handled to derive morphological quantitative values? Based on the videos, there is a lot of movement and shape-shifting.

      (6) While it is recognized that there are limits to what can be measured simultaneously with live imaging, the authors appear to have fixed tissues from each time point too - it would be very interesting to know if the extent or prion accumulation influences the microglial surveillance - i.e., do the enveloped ones have greater pathology>

    1. eLife Assessment

      This study introduces a valuable new metric-phenological lag-to help partition the drivers of observed versus expected shifts in spring phenology under climate warming. The conceptual framework is clearly presented and supported by an extensive dataset, and the revisions have improved the manuscript, though some concerns-particularly regarding uncertainty quantification, spatial analysis, and modeling assumptions-remain only partially addressed. The strength of evidence is generally solid, but further analysis would help to validate the study's conclusions.

    2. Reviewer #3 (Public review):

      Summary:

      The authors developed a new phenological lag metric and applied this analytical framework to a global dataset to synthesize shifts in spring phenology and assess how abiotic constraints influence spring phenology.

      Strengths:

      The dataset developed in this study is extensive, and the phenological lag metric is valuable.

      Weaknesses:

      The stability of the method used to calculate forcing requirements needs improvement, for example by including different base temperature thresholds. In addition, the visualization of the results should be improved.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present evidence that spatial trends in phenological responses to warming may differ than what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species. 

      Strengths: 

      A straightforward mathematical definition of phenological lag allows for this method to potentially be applied in different geographic contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone. 

      Identifying phenological lag, and associated contributing factors, provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models. 

      Weaknesses: 

      The analysis here could be more robust. A more thorough examination of phenological lag would provide stronger evidence that the framework presented has utility. The differences in phenologica lag by study approach, species origin, region, and growth form are interesting, and could be expanded. For example, the authors have the data to explore the relationships between phenological lag and the quantitative variables included in the final model (altitude, latitude, mean annual temperature) and other spatial or temporal variables. This would also provide stronger evidence for the author's claims about potential mechanisms that contribute to phenological lag. 

      We did examine the relationships of phenological lag with geographic or climatic variables in our analyses. Other than the weak correlations with latitude and altitude cited in the discussion section (lines 292-293), phenological lag was not related to mean annual temperature or long-term precipitation for both leafing and flowering.  

      The authors include very little data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone. 

      Table 2 shows the influences of geographic or climatic variables, particularly those related to drivers of spring phenology, i.e., budburst temperature, forcing change, and phenological lag, on phenological changes. As quantitative contributions of these drivers have been extracted, the influences of remaining variables are either minor or insignificant. Thus, examination of variable distributions, which has been done in previous syntheses, is probably not necessary.         

      Some of independent variables were apparently correlated (r <0.6), e.g., MAT with altitude and latitude, budburst temperature with forcing change and spring warming, and forcing change with spring warming.

      Reviewer #3 (Public review): 

      Summary: 

      The authors developed a new phenological lag metric and applied this analytical framework to a global dataset to synthesize shifts in spring phenology and assess how abiotic constraints influence spring phenology. 

      Strengths: 

      The dataset developed in this study is extensive, and the phenological lag metric is valuable. 

      Weaknesses: 

      The stability of the method used in this study needs improvement, particularly in the calculation of forcing requirements. In addition, the visualization of the results (such as Table 1) should be enhanced. 

      Not clear how to improve the calculation of forcing accumulation.    

      Recommendations for the authors: 

      Editor (Recommendations for the authors): 

      To improve the robustness of the metric and the conclusions drawn, we recommend that the authors: 

      Test the sensitivity of their results to different base temperature thresholds and to nonlinear forcing response models, even for a subset of species. The proposed framework relies on an accurate understanding of species-specific thermal responses, which remain poorly resolved.

      Different above-zero base temperatures have been used previously, although justifications are mostly following previous work. As we indicated in our first responses, the use of above-zero base temperatures underestimates forcing from low temperatures, which has more impacts on species with early spring phenology or in areas of cold climate due to greater proportions of forcing accumulations from low temperatures. The use of high base temperatures can lead to an interpretation that early season species require little or no forcing to break buds, which is biologically incorrect. Thus, the use of above-zero base temperatures would be more appropriate for particular locations or species than for meta-analysis across different spring phenology and climatic conditions. 

      The research on multiple warming is limited in terms of levels of warming used (mostly one and occasionally two) for assessing non-linear forcing responses. This can be the focus of future work.  

      Our framework is based on drivers of spring phenology and not dependent on “accurate understanding of species-specific thermal responses”. However, a good understanding of species- and site-specific responses to phenological constraints (e.g., insufficient winter chilling, photoperiod, and environmental stresses) does help determine the nature of phenological lag. All these are explained in our paper.     

      Analyze relationships between phenological lag and additional geographic or climatic gradients already available in the dataset (e.g., latitude, mean annual temperature, interannual variability). 

      We did examine the relationships of phenological lag with geographic or climatic variables in our analyses. Other than the weak correlations with latitude and altitude cited in the discussion section (lines 292-293), phenological lag was not related to mean annual temperature or long-term precipitation for both leafing and flowering.  

      Our objective is to understand changes in spring phenology and differences in plant phenological responses across different functional groups or climatic regions, although our approach can be used to study interannual variability of spring phenology. Our metadata are compiled for comparing warmer vs control treatments (often multiyear averages), not for assessing interannual variability.      

      Improve data visualization to better convey how phenological lag varies across environmental and biological contexts. 

      See responses above.

      Consider incorporating explicit uncertainty estimates around phenological lag calculations.  These steps would improve the interpretability and generalizability of the framework, helping it move from a useful heuristic to a more robust and empirically grounded analytical tool. 

      The calculation of phenological lag is based on drivers of spring phenology with uncertainty depending primarily on uncertainty in phenological observations. Previous uncertainty assessments can be used here (see a few selected studies below).   

      Alles, G.R., Comba, J.L., Vincent, J.M., Nagai, S. and Schnorr, L.M., 2020. Measuring phenology uncertainty with large scale image processing. Ecological Informatics, 59, p.101109.

      Liu G, Chuine I, Denéchère R, Jean F, Dufrêne E, Vincent G, Berveiller D, Delpierre N. Higher sample sizes and observer intercalibration are needed for reliable scoring of leaf phenology in trees. Journal of Ecology. 2021 Jun;109(6):2461-74.

      Tang, J., Körner, C., Muraoka, H., Piao, S., Shen, M., Thackeray, S.J. and Yang, X., 2016.Emerging opportunities and challenges in phenology: a review. Ecosphere, 7(8), p.e01436. 

      Nagai, S., Inoue, T., Ohtsuka, T., Yoshitake, S., Nasahara, K.N. and Saitoh, T.M., 2015. Uncertainties involved in leaf fall phenology detected by digital camera. Ecological Informatics, 30, pp.124-132.

    1. eLife Assessment

      This study provides novel and convincing evidence that both dopamine D1 and D2 expressing neurons in the nucleus accumbens shell are crucial for the expression of cue-guided action selection, a core component of decision-making. The research is systematic and rigorous in using optogenetic inhibition of either D1- or D2-expressing medium spiny neurons in the NAc shell to reveal attenuation of sensory-specific Pavlovian-Instrumental transfer, while largely sparing value-based decision on an instrumental task. The important findings in this report build on prior research and resolve some conflicts in the literature regarding decision-making.

    2. Reviewer #1 (Public review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics and the well-established behavioral paradigm outcome-specific PIT - sPIT), Octavia Soegyono and colleagues decipher the differential contribution of dopamine receptors D1 and D2 expressing-spiny projection neurons (SPNs).

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2-SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these effects were specific to stimulus-based actions, as value-based choices were left intact in all manipulations.

      This is a well-designed study and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and add to the current literature.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Soegyono et a. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cue-guided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no effects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum were required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths:

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value guided action selection. The inclusion of reporter only control groups is rigorous and rules out nonspecific effects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provides a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry.

      Weaknesses:

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration for D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      Conclusions:

      The research described here was successful in providing critical new insights into the contributions of NAc D1 and D2 neurons in cue-guided action selection. The authors' data interpretation and conclusions are well reasoned and appropriate. They also provide a thoughtful discussion of study limitations and implications for future research. This research is therefore likely to have a significant impact on the field.

      Comments on the previous version:

      I have reviewed the rebuttal and revised manuscript and have no remaining concerns.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer#1 (Public Review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics and the well-established behavioral paradigm outcome-specific PIT - sPIT), Octavia Soegyono and colleagues decipher the diOerential contribution of dopamine receptors D1 and D2 expressing-spiny projection neurons (SPNs).

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these eOects were specific to stimulus-based actions, as valuebased choices were left intact in all manipulations.

      This is a well-designed study and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and add to the current literature.

      We thank the Reviewer for their positive assessment.

      Comments on revisions:  

      We thank the authors for their detailed responses and for addressing our comments and concerns.

      To further improve consistency and transparency, we kindly request that the authors provide, for Supplemental Figures S1-S4, panels E (raw data for lever presses during the PIT test), the individual data points together with the corresponding statistical analyses in the figure legends.

      Panel E of Figures S1-S4 now includes the individual data points. The outcome-specific data have already been analysed, and we report these analyses in the main manuscript. These analyses are more informative than those requested by the Reviewer since they report the net eFects of the stimuli on choice between actions while controlling for potential individual baseline instrumental performance. All data remain fully transparent and are publicly available on an online repository in accordance with eLife policies (see relevant section in Materials and Methods).  

      In addition, regarding Supplemental Figure S3, panel E, we note the absence of a PIT eOect in the eYFP group under the ON condition, which appears to diOer from the net response reported in the main Figure 5, panel B. Could the authors clarify this apparent discrepancy?

      We apologize for the error, which has now been corrected. 

      We also note a discrepancy between the authors' statement in their response ("40 rats excluded based on post-mortem analyses") and the number of excluded animals reported in the Materials and Methods section, which adds up to 47. We kindly ask the authors to clarify this point for consistency.

      We thank the Reviewer for identifying the error reported in our initial response. The total number of animals excluded was 47, as reported in the manuscript. 

      Finally, as a minor point, we suggest indicating the total number of animals used in the study in the Materials and Methods section.

      The total number of animals has been included in the Materials and Methods section.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Soegyono et a. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cueguided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no eOects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum were required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths:

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value guided action selection. The inclusion of reporter only control groups is rigorous and rules out nonspecific eOects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provides a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration for D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      We acknowledge the reviewer's valuable suggestion that demonstrating NAc-S D1- and D2-SPNs engagement in outcome-specific PIT through another technique would strengthen our optogenetic findings. Several approaches could provide this validation. Chemogenetic manipulation, as the reviewer suggested, represents one compelling option. Alternatively, immunohistochemical assessment of phosphorylated histone H3 at serine 10 (P-H3) oFers another promising avenue, given its established utility in reporting striatal SPNs plasticity in the dorsal striatum (Matamales et al., 2020). We hope to complete such an assessment in future work since it would address the limitations of previous work that relied solely on ERK1/2 phosphorylation measures in NAc-S SPNs (Laurent et al., 2014). The manuscript was modified to report these future avenues of research (page 12). 

      Regarding the null result from optical silencing of D2 terminals in the ventral pallidum, we agree with the reviewer's assessment. While we acknowledge this limitation in the current manuscript (page 13), we aim to address this gap in future studies to provide a more complete mechanistic understanding of the circuit.

      Conclusions:

      The research described here was successful in providing critical new insights into the contributions of NAc D1 and D2 neurons in cue-guided action selection. The authors' data interpretation and conclusions are well reasoned and appropriate. They also provide a thoughtful discussion of study limitations and implications for future research. This research is therefore likely to have a significant impact on the field.

      We thank the Reviewer for their positive assessment.

      Comments on revisions:

      I have reviewed the rebuttal and revised manuscript and have no remaining concerns.

      We are pleased to have addressed the Reviewer’s query.

      References

      Laurent, V., Bertran-Gonzalez, J., Chieng, B. C., & Balleine, B. W. (2014). δ-Opioid and Dopaminergic Processes in Accumbens Shell Modulate the Cholinergic Control of Predictive Learning and Choice. J Neurosci, 34(4), 1358-1369. https://doi.org/10.1523/JNEUROSCI.4592-13.2014

      Matamales, M., McGovern, A. E., Mi, J. D., Mazzone, S. B., Balleine, B. W., & BertranGonzalez, J. (2020). Local D2- to D1-neuron transmodulation updates goal-directed learning in the striatum. Science, 367(6477), 549-555. https://doi.org/10.1126/science.aaz5751

    1. eLife Assessment

      This paper addresses the significant question of quantifying epistasis patterns, which affect the predictability of evolution, by reanalyzing a recently published combinatorial deep mutational scan experiment. The findings are useful, showing that epistasis is fluid, i.e. strongly background dependent, but that fitness effects of mutations are statistically predictable based on the background fitness. While the general approach appears solid, some claims remain incompletely supported by the analysis, as arbitrary cutoffs are used and the description of methods lacks specifics. This analysis should be of interest to the community working on fitness landscapes.

    2. Reviewer #1 (Public review):

      The paper reports some interesting patterns in epistasis in a recently published large fitness landscape dataset. The results may have implications for our understanding of fitness landscapes and protein evolution. However, this version of the paper remains fairly descriptive and has significant deficiencies in clarity and rigor.

      The authors have addressed some of my criticisms (e.g., I appreciate the additional analysis of synonymous mutations, and a more rigorous approach to calling fitness peaks), but many of the issues raised in my first round of review remain in the current version. Frankly, I am quite disappointed that the authors did not address my comments point by point, which is the norm. The remaining (and some new) issues are below.

      (1a) (Modified from first round) I previously suggested to dissect what appears to be three different patterns of epistasis: "strong" and "weak" global epistasis and what one can could "purely idiosyncratic", i.e., not dependent on background fitness. The authors attempted to address this, but I don't think what they have done is sufficient. They make a statement "The lethal mutations have a slope smaller than -0.7 and average slope of -0.98. The remaining mutations all have a slope greater than -0.56" (LL 274-276)", but there is no evidence provided to support this claim. This is a strong and I think interesting statement (btw, how is "lethal" defined?) and warrants a dedicated figure. This statement suggests that the mixed patterns shown in Figure 5 can actually be meaningfully separated. Why don't the authors show this? Instead, they still claim "overall, global epistasis is not very strong on the folA landscape" (LL. 273-274). I maintain that this claim does not quite capture the observations.

      Later in the text there is a whole section called "Only a small fraction of mutations exhibit strong global epistasis", which also seems related to this issue. First, I don't follow the logic here. Why is this section separate from this initial discussion? Second, here the authors claim "only a small subset of mutations exhibits strong global epistasis (R^2 > 0.5)" and then "This sharp contrast suggests a binary behavior of mutations: they either exhibit strong global epistasis (R2 > 0.5), or not (R2 < 0.5)." But this R^2 threshold seems arbitrary, and I don't see any statistical support for this binary nature.

      (1b) (Verbatim from first round) Another rather remarkable feature of this plot is that the slopes of the strong global epistasis patterns sem to be very similar across mutations. Is this the case? Is there anything special about this slope? For example, does this slope simply reflect the fact that a given mutation becomes essentially lethal (i.e., produces the same minimal fitness) in a certain set of background genotypes?

      (1c) (Verbatim from first round) Finally, how consistent are these patterns with some null expectations? Specifically, would one expect the same distribution of global epistasis slopes on an uncorrelated landscape? Are the pivot points unusually clustered relative to an expectation on an uncorrelated landscape?

      (1d) (Verbatim from first round) The shapes of the DFE shown in Figure 7 are also quite interesting, particularly the bimodal nature of the DFE in high-fitness (HF) backgrounds. I think this bimodalilty must be a reflection of clustering of mutation-background combinations mentioned above. I think the authors ought to draw this connection explicitly. Do all HF backgrounds have a bimodal DFE? What mutations occupy the "moving" peak?

      (1e) (Modified from first round). I still don't understand why there are qualitative differences in the shape of the DFE between functional and non-functional backgrounds (Figure 8B,C). Why is the transition between bimodal DFE in Figure 8B and unimodal DFE in Figure 8C is so abrupt? Perhaps the authors can plot the DFEs for all backgrounds on the same plot and just draw a line that separates functional and non-functional backgrounds so that the reader can better see whether DFE shape changes gradually or abruptly.

      (1f) (Modified from first round) I am now more convinced that synonymous mutations alter epistasis and behave differently than non-synonymous mutations, but I still have some questions. (i) I would have liked a side-by-side comparison of synonymous and non-synonymous mutations, both in terms of their effects on fitness and on epistasis.<br /> (ii) The authors claim (LL 278-286) that "synonymous substitutions tend to follow two recurring behaviors" but this is not shown. To demonstrate this, the authors ought to plot (for example) the distribution of slopes of regression lines. Is this distribution actually bimodal? (iii) Later in the same paragraph the authors say "synonymous changes do not exhibit very strong background fitness-dependence". I don't see how this follows from the previous discussion.

      (2) The authors claim to have improved statistical rigor of their analysis, but the Methods section is really thin and inadequate for understanding how the statistical analyses were done.

      (3) In general, I notice a regrettable lack of attention to detail in the text, which makes me worried about a similar problem in the actual data analysis. Here are a few examples. (i) Throughout the text, the authors now refer to functional and non-functional genotypes, but several figures and captions retained the old HF and LF designations. (ii) Figure 7 is called Figure 8. (iii) Figure 3B is not discussed, though it logically precedes Figure 3A and 3C. (iv) Many of my comments, especially minor, were not addressed at all.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have studied a previously published large dataset on the fitness landscape of a 9 base-pair region of the folA gene. The objective of the paper is to understand various aspects of epistasis in this system, which the authors have achieved through detailed and computationally expensive exploration of the landscape. The authors describe epistasis in this system as "fluid", meaning that it depends sensitively on the genetic background, thereby reducing the predictability of evolution at the genetic level. However, the study also finds some robust patterns. The first is the existence of a "pivot point" for a majority of mutations, which is a fixed growth rate at which the effect of mutations switches from beneficial to deleterious (consistent with a previous study on the topic). The second is the observation that the distribution of fitness effects (DFE) of mutations is predicted quite well by the fitness of the genotype, especially for high-fitness genotypes. While the work does not offer a synthesis of the multitude of reported results, the information provided here raises interesting questions for future studies in this field.

      Strengths:

      A major strength of the study is its multifaceted approach, which has helped the authors tease out a number of interesting epistatic properties. The study makes a timely contribution by focusing on topical issues like global epistasis, the existence of pivot points, and the dependence of DFE on the background genotype and its fitness.

      The authors have classified pairwise epistasis into six types, and found that the type of epistasis changes depending on background mutations. Switches happen more frequently for mutations at functionally important sites. Interestingly, the authors find that even synonymous mutations can alter the epistatic interaction between mutations in other codons, and this effect is uncorrelated with the direct fitness effects of the synonymous mutations. Alongside the observations of "fluidity", the study reports limited instances of global epistasis (which predicts a simple linear relationship between the size of a mutational effect and the fitness of the genetic background in which it occurs). Overall, the work presents strong evidence for the genetic context-dependent nature of epistasis in this system.

      Weaknesses:

      Despite the wealth of information provided by the study, there are a few points of concern.

      The authors find that in non-functional genotypic backgrounds, most pairs of mutations display no epistasis. However, we do not know if this simply because a significant epistatic signal is hard to detect since all the fitness values involved in calculating epistasis are small (and therefore noise-prone). A control can be done by determining whether statistically significant differences exist among the fitness values themselves. In the absence of such information, it is hard to understand whether the classification of epistasis for non-functional backgrounds into discrete categories, such as in Fig 1C, is meaningful.

      The authors have looked for global epistasis (i.e. a negative dependence of mutational fitness effect on background fitness) in all 108 (9x12) mutations in the landscape. They report that the majority of the mutations (77/108 or about 71 per cent) display weak correlation between fitness effect and background fitness (R^2<0.2), and a relatively small proportion show particularly strong correlation (R^2>0.5). They therefore conclude that global epistasis in this system is 'binary'-meaning that strong global epistasis is restricted to a few sites, whereas weak global epistasis occurs in the rest (Figure 5). Precise definitions of 'strong' and 'weak' are not given in the text, but the authors do mention that they are interested here primarily in detecting whether a correlation with background fitness exists or not. This again raises the question of the extent to which the low (and possibly noisy) fitness values of non-functional backgrounds can confound the results. For example, would the results be much the same if the analysis was repeated with only high-fitness backgrounds or only those sets of genotypes where the fitness differences between backgrounds and mutants were significant?<br /> Apart from this, I am also a bit conceptually perplexed by the term 'binary behavior', which suggests that the R^2 values should belong to two distinct classes; but, even assuming that the reported results are robust, Figure S12 shows that most values are 0.2 or less whereas higher values are more or less evenly distributed in the range 0.2-1.0, rather than showing an overall bimodal pattern. An especially confusing remark by the authors in this regard is the following; "This sharp contrast suggests a binary behavior of mutations: they either exhibit strong global epistasis (R^2 > 0.5), or not (R^2 < 0.5)'.

      Conclusions: As large datasets on empirical fitness landscapes become increasingly available, more computational studies are needed to extract as much information from them as possible. The authors have made a timely effort in this direction. It is particularly instructive to learn from the work that higher-order epistasis is pervasive in the studied intragenic landscape, at least in functional genotypic backgrounds. Some of the analysis and interpretations in the paper require careful scrutiny, and the lack of a synthesis of the multitude of reported results leaves something to be desired. But the paper contains intriguing observations that can fuel further research into the factors shaping the topography of complex landscapes.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      This paper describes a number of patterns of epistasis in a large fitness landscape dataset recently published by Papkou et al. The paper is motivated by an important goal in the field of evolutionary biology to understand the statistical structure of epistasis in protein fitness landscapes, and it capitalizes on the unique opportunities presented by this new dataset to address this problem. 

      The paper reports some interesting previously unobserved patterns that may have implications for our understanding of fitness landscapes and protein evolution. In particular, Figure 5 is very intriguing. However, I have two major concerns detailed below. First, I found the paper rather descriptive (it makes little attempt to gain deeper insights into the origins of the observed patterns) and unfocused (it reports what appears to be a disjointed collection of various statistics without a clear narrative. Second, I have concerns with the statistical rigor of the work. 

      (1) I think Figures 5 and 7 are the main, most interesting, and novel results of the paper. However, I don't think that the statement "Only a small fraction of mutations exhibit global epistasis" accurately describes what we see in Figure 5. To me, the most striking feature of this figure is that the effects of most mutations at all sites appear to be a mixture of three patterns. The most interesting pattern noted by the authors is of course the "strong" global epistasis, i.e., when the effect of a mutation is highly negatively correlated with the fitness of the background genotype. The second pattern is a "weak" global epistasis, where the correlation with background fitness is much weaker or non-existent. The third pattern is the vertically spread-out cluster at low-fitness backgrounds, i.e., a mutation has a wide range of mostly positive effects that are clearly not correlated with fitness. What is very interesting to me is that all background genotypes fall into these three groups with respect to almost every mutation, but the proportions of the three groups are different for different mutations. In contrast to the authors' statement, it seems to me that almost all mutations display strong global epistasis in at least a subset of backgrounds. A clear example is C>A mutation at site 3. 

      (1a) I think the authors ought to try to dissect these patterns and investigate them separately rather than lumping them all together and declaring that global epistasis is rare. For example, I would like to know whether those backgrounds in which mutations exhibit strong global epistasis are the same for all mutations or whether they are mutation- or perhaps positionspecific. Both answers could be potentially very interesting, either pointing to some specific site-site interactions or, alternatively, suggesting that the statistical patterns are conserved despite variation in the underlying interactions. 

      (1b) Another rather remarkable feature of this plot is that the slopes of the strong global epistasis patterns seem to be very similar across mutations. Is this the case? Is there anything special about this slope? For example, does this slope simply reflect the fact that a given mutation becomes essentially lethal (i.e., produces the same minimal fitness) in a certain set of background genotypes? 

      (1c) Finally, how consistent are these patterns with some null expectations? Specifically, would one expect the same distribution of global epistasis slopes on an uncorrelated landscape? Are the pivot points unusually clustered relative to an expectation on an uncorrelated landscape? 

      (1d) The shapes of the DFE shown in Figure 7 are also quite interesting, particularly the bimodal nature of the DFE in high-fitness (HF) backgrounds. I think this bimodality must be a reflection of the clustering of mutation-background combinations mentioned above. I think the authors ought to draw this connection explicitly. Do all HF backgrounds have a bimodal DFE? What mutations occupy the "moving" peak? 

      (1e) In several figures, the authors compare the patterns for HF and low-fitness (LF) genotypes. In some cases, there are some stark differences between these two groups, most notably in the shape of the DFE (Figure 7B, C). But there is no discussion about what could underlie these differences. Why are the statistics of epistasis different for HF and LF genotypes? Can the authors at least speculate about possible reasons? Why do HF and LF genotypes have qualitatively different DFEs? I actually don't quite understand why the transition between bimodal DFE in Figure 7B and unimodal DFE in Figure 7C is so abrupt. Is there something biologically special about the threshold that separates LF and HF genotypes? My understanding was that this was just a statistical cutoff. Perhaps the authors can plot the DFEs for all backgrounds on the same plot and just draw a line that separates HF and LF backgrounds so that the reader can better see whether the DFE shape changes gradually or abruptly.

      (1f) The analysis of the synonymous mutations is also interesting. However I think a few additional analyses are necessary to clarify what is happening here. I would like to know the extent to which synonymous mutations are more often neutral compared to non-synonymous ones. Then, synonymous pairs interact in the same way as non-synonymous pair (i.e., plot Figure 1 for synonymous pairs)? Do synonymous or non-synonymous mutations that are neutral exhibit less epistasis than non-neutral ones? Finally, do non-synonymous mutations alter epistasis among other mutations more often than synonymous mutations do? What about synonymous-neutral versus synonymous-non-neutral. Basically, I'd like to understand the extent to which a mutation that is neutral in a given background is more or less likely to alter epistasis between other mutations than a non-neutral mutation in the same background. 

      (2) I have two related methodological concerns. First, in several analyses, the authors employ thresholds that appear to be arbitrary. And second, I did not see any account of measurement errors. For example, the authors chose the 0.05 threshold to distinguish between epistasis and no epistasis, but why this particular threshold was chosen is not justified. Another example: is whether the product s12 × (s1 + s2) is greater or smaller than zero for any given mutation is uncertain due to measurement errors. Presumably, how to classify each pair of mutations should depend on the precision with which the fitness of mutants is measured. These thresholds could well be different across mutants. We know, for example, that low-fitness mutants typically have noisier fitness estimates than high-fitness mutants. I think the authors should use a statistically rigorous procedure to categorize mutations and their epistatic interactions. I think it is very important to address this issue. I got very concerned about it when I saw on LL 383-388 that synonymous stop codon mutations appear to modulate epistasis among other mutations. This seems very strange to me and makes me quite worried that this is a result of noise in LF genotypes. 

      Thank you for your review of the manuscript. In the revised version, we have addressed both major criticisms, as detailed below.

      When carefully examining the plots in Figure 5 independently, we indeed observe that the fitness effect of a mutation on different genetic backgrounds can be classified into three characteristic patterns. Our reasoning for these patterns is as follows:

      Strong correlation: Typically observed when the mutation is lethal across backgrounds. Linear regression of mutations exhibiting strong global epistasis shows slopes close to −1 and pivot points near −0.7 (Table S4). Since the reported fitness threshold is −0.508, these mutations push otherwise functional backgrounds into the non-functional range, consistent with lethal effects.

      Weak correlation: Observed when a mutation has no significant effect on fitness across backgrounds, consistent with neutrality.

      No correlation: Out of the 261,333 reported variants, 243,303 (93%) lie below the fitness threshold of −0.508, indicating that the low-fitness region is densely populated by nonfunctional variants. The “strong correlation” and “weak correlation” lines intersect in this zone. Most mutations in this region have little effect (neutral), but occasional abrupt fitness increases correspond to “resurrecting” mutations, the converse of lethal changes. For example, mutations such as X→G at locus 4 or X→A at locus 5 restore function, while the reverse changes (e.g. C→A at locus 3) are lethal.

      Thus, the “no-correlation” pattern is largely explained by mutations that reverse the effect of lethal changes, effectively resurrecting non-functional variants. In the revised manuscript, we highlight these nuances within the broader classification of fitness effect versus background fitness (pp. 10–13).

      Additional analyses included in the revision:

      Synonymous vs. non-synonymous pairs: We repeated the Figure 1 analysis for synonymous–synonymous pairs. As expected, synonymous pairs exhibit lower overall frequencies of epistasis, consistent with their greater neutrality. However, the qualitative spectrum remains similar: positive and negative epistasis dominate, while sign epistasis is rare (Supplementary Figs. S6–S7, S9–S10).

      Fitness effect vs. epistasis change: We tested whether the mean fitness effect of a mutation correlates with the percent of cases in which it changes the nature of epistasis. No correlation was found (R² ≈ 0.11), and this analysis is now included in the revised manuscript.

      Epistasis-modulating ability: Non-synonymous mutations more frequently alter the interactions between other mutations than synonymous substitutions. Within synonymous substitutions, the subset with measurable fitness effects disproportionately contributes to epistasis modulation. Thus, the ability of synonymous substitutions to modulate epistasis arises primarily from the non-neutral subset.

      These analyses clarify the role of synonymous mutations in reshaping epistasis on the folA landscape.

      Revision of statistical treatment of epistasis:

      In our original submission, we used an arbitrary threshold of 0.05 to classify the presence or absence of epistasis, following Papkou et al., who based conclusions on a single experimental replicate. However, as the reviewer correctly noted, this does not adequately account for measurement variability across different genotypes.

      In the revised manuscript, we adopt a statistically rigorous framework that incorporates replicate-based error directly. Specifically, we now use the mean fitness across six independent replicates, together with the corresponding standard deviation, to classify fitness peaks and epistasis. This eliminates arbitrary thresholds and ensures that epistatic classifications reflect the precision of measurements for each genotype.

      This revision led to both quantitative and qualitative changes:

      For high-fitness genotypes, the core patterns of higher-order (“fluid”) epistasis remain robust (Figures 2–3).

      For low-fitness genotypes, incorporating replicate-based error removed spurious fluidity effects, yielding a more accurate characterization of epistasis (Figures 2–3; Supplementary Figs. S6–S7, S9–S10).

      We describe these methodological changes in detail in the revised Methods section and provide updated code.

      Together, these revisions directly address the reviewer’s concerns. They improve the statistical rigor of our analysis, strengthen the robustness of our conclusions, and underscore the importance of accounting for measurement error in large-scale fitness landscape studies—a point we now emphasize in the manuscript.

      Reviewer #2 (Public review): 

      Significance: 

      This paper reanalyzes an experimental fitness landscape generated by Papkou et al., who assayed the fitness of all possible combinations of 4 nucleotide states at 9 sites in the E. coli DHFR gene, which confers antibiotic resistance. The 9 nucleotide sites make up 3 amino acid sites in the protein, of which one was shown to be the primary determinant of fitness by Papkou et al. This paper sought to assess whether pairwise epistatic interactions differ among genetic backgrounds at other sites and whether there are major patterns in any such differences. They use a "double mutant cycle" approach to quantify pairwise epistasis, where the epistatic interaction between two mutations is the difference between the measured fitness of the double-mutant and its predicted fitness in the absence of epistasis (which equals the sum of individual effects of each mutation observed in the single mutants relative to the reference genotype). The paper claims that epistasis is "fluid," because pairwise epistatic effects often differs depending on the genetic state at the other site. It also claims that this fluidity is "binary," because pairwise effects depend strongly on the state at nucleotide positions 5 and 6 but weakly on those at other sites. Finally, they compare the distribution of fitness effects (DFE) of single mutations for starting genotypes with similar fitness and find that despite the apparent "fluidity" of interactions this distribution is well-predicted by the fitness of the starting genotype. 

      The paper addresses an important question for genetics and evolution: how complex and unpredictable are the effects and interactions among mutations in a protein? Epistasis can make the phenotype hard to predict from the genotype and also affect the evolutionary navigability of a genotype landscape. Whether pairwise epistatic interactions depend on genetic background - that is, whether there are important high-order interactions -- is important because interactions of order greater than pairwise would make phenotypes especially idiosyncratic and difficult to predict from the genotype (or by extrapolating from experimentally measured phenotypes of genotypes randomly sampled from the huge space of possible genotypes). Another interesting question is the sparsity of such high-order interactions: if they exist but mostly depend on a small number of identifiable sequence sites in the background, then this would drastically reduce the complexity and idiosyncrasy relative to a landscape on which "fluidity" involves interactions among groups of all sites in the protein. A number of papers in the recent literature have addressed the topics of high-order epistasis and sparsity and have come to conflicting conclusions. This paper contributes to that body of literature with a case study of one published experimental dataset of high quality. The findings are therefore potentially significant if convincingly supported. 

      Validity: 

      In my judgment, the major conclusions of this paper are not well supported by the data. There are three major problems with the analysis. 

      (1) Lack of statistical tests. The authors conclude that pairwise interactions differ among backgrounds, but no statistical analysis is provided to establish that the observed differences are statistically significant, rather than being attributable to error and noise in the assay measurements. It has been established previously that the methods the authors use to estimate high-order interactions can result in inflated inferences of epistasis because of the propagation of measurement noise (see PMID 31527666 and 39261454). Error propagation can be extreme because first-order mutation effects are calculated as the difference between the measured phenotype of a single-mutant variant and the reference genotype; pairwise effects are then calculated as the difference between the measured phenotype of a double mutant and the sum of the differences described above for the single mutants. This paper claims fluidity when this latter difference itself differs when assessed in two different backgrounds. At each step of these calculations, measurement noise propagates. Because no statistical analysis is provided to evaluate whether these observed differences are greater than expected because of propagated error, the paper has not convincingly established or quantified "fluidity" in epistatic effects. 

      (2) Arbitrary cutoffs. Many of the analyses involve assigning pairwise interactions into discrete categories, based on the magnitude and direction of the difference between the predicted and observed phenotypes for a pairwise mutant. For example, the authors categorize as a positive pairwise interaction if the apparent deviation of phenotype from prediction is >0.05, negative if the deviation is <-0.05, and no interaction if the deviation is between these cutoffs. Fluidity is diagnosed when the category for a pairwise interaction differs among backgrounds. These cutoffs are essentially arbitrary, and the effects are assigned to categories without assessing statistical significance. For example, an interaction of 0.06 in one background and 0.04 in another would be classified as fluid, but it is very plausible that such a difference would arise due to error alone. The frequency of epistatic interactions in each category as claimed in the paper, as well as the extent of fluidity across backgrounds, could therefore be systematically overestimated or underestimated, affecting the major conclusions of the study. 

      (3) Global nonlinearities. The analyses do not consider the fact that apparent fluidity could be attributable to the fact that fitness measurements are bounded by a minimum (the fitness of cells carrying proteins in which DHFR is essentially nonfunctional) and a maximum (the fitness of cells in which some biological factor other than DHFR function is limiting for fitness). The data are clearly bounded; the original Papkou et al. paper states that 93% of genotypes are at the low-fitness limit at which deleterious effects no longer influence fitness. Because of this bounding, mutations that are strongly deleterious to DHFR function will therefore have an apparently smaller effect when introduced in combination with other deleterious mutations, leading to apparent epistatic interactions; moreover, these apparent interactions will have different magnitudes if they are introduced into backgrounds that themselves differ in DHFR function/fitness, leading to apparent "fluidity" of these interactions. This is a well-established issue in the literature (see PMIDs 30037990, 28100592, 39261454). It is therefore important to adjust for these global nonlinearities before assessing interactions, but the authors have not done this. 

      This global nonlinearity could explain much of the fluidity claimed in this paper. It could explain the observation that epistasis does not seem to depend as much on genetic background for low-fitness backgrounds, and the latter is constant (Figure 2B and 2C): these patterns would arise simply because the effects of deleterious mutations are all epistatically masked in backgrounds that are already near the fitness minimum. It would also explain the observations in Figure 7. For background genotypes with relatively high fitness, there are two distinct peaks of fitness effects, which likely correspond to neutral mutations and deleterious mutations that bring fitness to the lower bound of measurement; as the fitness of the background declines, the deleterious mutations have a smaller effect, so the two peaks draw closer to each other, and in the lowest-fitness backgrounds, they collapse into a single unimodal distribution in which all mutations are approximately neutral (with the distribution reflecting only noise). Global nonlinearity could also explain the apparent "binary" nature of epistasis. Sites 4 and 5 change the second amino acid, and the Papkou paper shows that only 3 amino acid states (C, D, and E) are compatible with function; all others abolish function and yield lower-bound fitness, while mutations at other sites have much weaker effects. The apparent binary nature of epistasis in Figure 5 corresponds to these effects given the nonlinearity of the fitness assay. Most mutations are close to neutral irrespective of the fitness of the background into which they are introduced: these are the "non-epistatic" mutations in the binary scheme. For the mutations at sites 4 and 5 that abolish one of the beneficial mutations, however, these have a strong background-dependence: they are very deleterious when introduced into a high-fitness background but their impact shrinks as they are introduced into backgrounds with progressively lower fitness. The apparent "binary" nature of global epistasis is likely to be a simple artifact of bounding and the bimodal distribution of functional effects: neutral mutations are insensitive to background, while the magnitude of the fitness effect of deleterious mutations declines with background fitness because they are masked by the lower bound. The authors' statement is that "global epistasis often does not hold." This is not established. A more plausible conclusion is that global epistasis imposed by the phenotype limits affects all mutations, but it does so in a nonlinear fashion. 

      In conclusion, most of the major claims in the paper could be artifactual. Much of the claimed pairwise epistasis could be caused by measurement noise, the use of arbitrary cutoffs, and the lack of adjustment for global nonlinearity. Much of the fluidity or higher-order epistasis could be attributable to the same issues. And the apparently binary nature of global epistasis is also the expected result of this nonlinearity. 

      We thank the reviewer for raising this important concern. We fully agree that the use of arbitrary thresholds in the earlier version of the manuscript, together with the lack of an explicit treatment of measurement error, could compromise the rigor of our conclusions. To address this, we have undertaken a thorough re-analysis of the folA landscape.

      (1)  Incorporating measurement error and avoiding noise-driven artifacts

      In the original version, we followed Papkou et al. in using a single experimental replicate and applying fixed thresholds to classify epistasis. As the reviewer correctly notes, this approach allows noise to propagate from single-mutant measurements to double-mutant effects, and ultimately to higher-order epistasis.

      In the revised analysis, we now:

      Use the mean fitness across all six independent replicates for each genotype.

      Incorporate the corresponding standard deviation as a measure of experimental error.

      Classify epistatic interactions only when differences between a genotype and its neighbors exceed combined error margins, rather than using a fixed cutoff.

      This ensures that observed changes in epistasis are statistically distinguishable from noise. Details are provided in the revised Methods section and updated code.

      (2) Replacing arbitrary thresholds with error-based criteria

      Previously, we used an arbitrary ±0.05 cutoff to define the presence/absence of epistasis. As the reviewer notes, this could misclassify interactions (e.g. labeling an effect as “fluid” when the difference lies within error). In the revised framework, these thresholds have been eliminated. Instead, interactions are classified based on whether their distributions overlap within replicate variance.

      This approach scales naturally with measurement precision, which differs between high-fitness and low-fitness genotypes, and removes the need for a universal cutoff.

      (3) Consequences of re-analysis

      Implementing this revised framework produced several important updates:

      High-fitness backgrounds: The qualitative picture of higher-order (“fluid”) epistasis remains robust. The patterns reported originally are preserved.

      Low-fitness backgrounds: Accounting for replicate variance revealed that part of the previously inferred “fluidity” arose from noise. These spurious effects are now removed, giving a more conservative but more accurate view of epistasis in non-functional regions.

      Fitness peaks: Our replicate-aware analysis identifies 127 peaks, compared to 514 in Papkou et al. Importantly, all 127 peaks occur in functional regions of the landscape. This difference highlights the importance of replicate-based error treatment: relying on a single run without demonstrating repeatability can yield artifacts.

      (4) Addressing bounding effects and terminology

      We also agree with the reviewer that bounding effects, arising from the biological limits of fitness, can create apparent nonlinearities in the genotype–phenotype map. To clarify this, we made the following changes:

      Terminology: We now use the term higher-order epistasis instead of fluid epistasis, emphasizing that the observed background-dependence involves more than two mutations and cannot be explained by global nonlinearities alone.

      We also clarify the definitions of sign-epistasis used in this work.

      By replacing arbitrary cutoffs with replicate-based error estimates and by explicitly considering bounding effects, we have substantially increased the rigor of our analysis. While this reanalysis led to both quantitative and qualitative changes in some regions, the central conclusion remains unchanged: higher-order epistasis is pervasive in the folA landscape, especially in functional backgrounds.

      All analysis scripts and codes are provided as Supplementary Material.

      Reviewer #3 (Public review): 

      Summary: 

      The authors have studied a previously published large dataset on the fitness landscape of a 9 base-pair region of the folA gene. The objective of the paper is to understand various aspects of epistasis in this system, which the authors have achieved through detailed and computationally expensive exploration of the landscape. The authors describe epistasis in this system as "fluid", meaning that it depends sensitively on the genetic background, thereby reducing the predictability of evolution at the genetic level. However, the study also finds two robust patterns. The first is the existence of a "pivot point" for a majority of mutations, which is a fixed growth rate at which the effect of mutations switches from beneficial to deleterious (consistent with a previous study on the topic). The second is the observation that the distribution of fitness effects (DFE) of mutations is predicted quite well by the fitness of the genotype, especially for high-fitness genotypes. While the work does not offer a synthesis of the multitude of reported results, the information provided here raises interesting questions for future studies in this field. 

      Strengths: 

      A major strength of the study is its detailed and multifaceted approach, which has helped the authors tease out a number of interesting epistatic properties. The study makes a timely contribution by focusing on topical issues like the prevalence of global epistasis, the existence of pivot points, and the dependence of DFE on the background genotype and its fitness. The methodology is presented in a largely transparent manner, which makes it easy to interpret and evaluate the results. 

      The authors have classified pairwise epistasis into six types and found that the type of epistasis changes depending on background mutations. Switches happen more frequently for mutations at functionally important sites. Interestingly, the authors find that even synonymous mutations in stop codons can alter the epistatic interaction between mutations in other codons. Consistent with these observations of "fluidity", the study reports limited instances of global epistasis (which predicts a simple linear relationship between the size of a mutational effect and the fitness of the genetic background in which it occurs). Overall, the work presents some evidence for the genetic context-dependent nature of epistasis in this system. 

      Weaknesses: 

      Despite the wealth of information provided by the study, there are some shortcomings of the paper which must be mentioned. 

      (1) In the Significance Statement, the authors say that the "fluid" nature of epistasis is a previously unknown property. This is not accurate. What the authors describe as "fluidity" is essentially the prevalence of certain forms of higher-order epistasis (i.e., epistasis beyond pairwise mutational interactions). The existence of higher-order epistasis is a well-known feature of many landscapes. For example, in an early work, (Szendro et. al., J. Stat. Mech., 2013), the presence of a significant degree of higher-order epistasis was reported for a number of empirical fitness landscapes. Likewise, (Weinreich et. al., Curr. Opin. Genet. Dev., 2013) analysed several fitness landscapes and found that higher-order epistatic terms were on average larger than the pairwise term in nearly all cases. They further showed that ignoring higher-order epistasis leads to a significant overestimate of accessible evolutionary paths. The literature on higher-order epistasis has grown substantially since these early works. Any future versions of the present preprint will benefit from a more thorough contextual discussion of the literature on higher-order epistasis.

      (2) In the paper, the term 'sign epistasis' is used in a way that is different from its wellestablished meaning. (Pairwise) sign epistasis, in its standard usage, is said to occur when the effect of a mutation switches from beneficial to deleterious (or vice versa) when a mutation occurs at a different locus. The authors require a stronger condition, namely that the sum of the individual effects of two mutations should have the opposite sign from their joint effect. This is a sufficient condition for sign epistasis, but not a necessary one. The property studied by the authors is important in its own right, but it is not equivalent to sign epistasis. 

      (3) The authors have looked for global epistasis in all 108 (9x12) mutations, out of which only 16 showed a correlation of R^2 > 0.4. 14 out of these 16 mutations were in the functionally important nucleotide positions. Based on this, the authors conclude that global epistasis is rare in this landscape, and further, that mutations in this landscape can be classified into one of two binary states - those that exhibit global epistasis (a small minority) and those that do not (the majority). I suspect, however, that a biologically significant binary classification based on these data may be premature. Unsurprisingly, mutational effects are stronger at the functional sites as seen in Figure 5 and Figure 2, which means that even if global epistasis is present for all mutations, a statistical signal will be more easily detected for the functionally important sites. Indeed, the authors show that the means of DFEs decrease linearly with background fitness, which hints at the possibility that a weak global epistatic effect may be present (though hard to detect) in the individual mutations. Given the high importance of the phenomenon of global epistasis, it pays to be cautious in interpreting these results. 

      (4) The study reports that synonymous mutations frequently change the nature of epistasis between mutations in other codons. However, it is unclear whether this should be surprising, because, as the authors have already noted, synonymous mutations can have an impact on cellular functions. The reader may wonder if the synonymous mutations that cause changes in epistatic interactions in a certain background also tend to be non-neutral in that background. Unfortunately, the fitness effect of synonymous mutations has not been reported in the paper. 

      (5) The authors find that DFEs of high-fitness genotypes tend to depend only on fitness and not on genetic composition. This is an intriguing observation, but unfortunately, the authors do not provide any possible explanation or connect it to theoretical literature. I am reminded of work by (Agarwala and Fisher, Theor. Popul. Biol., 2019) as well as (Reddy and Desai, eLife, 2023) where conditions under which the DFE depends only on the fitness have been derived. Any discussion of possible connections to these works could be a useful addition.  

      We thank the reviewer for the summary of our work and for highlighting both its strengths and areas for improvement. We have carefully considered the points raised and revised the manuscript accordingly. The revised version:

      (1) Clarifies the conceptual framework. We emphasize the distinction between background-dependent, higher-order epistasis and global nonlinearities. To avoid ambiguity, we have replaced the term “fluid” epistasis with higher-order epistasis throughout, in line with prior literature (e.g. Szendro et al., 2013; Weinreich et al., 2013). We now explicitly situate our results in the context of these studies and clarify our definitions of epistasis, correcting the earlier error where “strong sign epistasis” was used in place of “sign epistasis.”

      (2) Improves statistical rigor. We now incorporate replicate variance and statistical error criteria in place of arbitrary thresholds. This ensures that classification of epistasis reflects experimental precision rather than fixed, arbitrary cutoffs.

      (3) Expands treatment of synonymous mutations. We now explicitly analyze synonymous mutations, separating those that are neutral from those that are non-neutral. Our results show that non-neutral synonymous mutations are disproportionately responsible for altering epistatic interactions, while neutral synonymous mutations rarely do so. We also report the fitness effects of synonymous mutations directly and include new analyses showing that there is no correlation between the mean fitness effect of a synonymous mutation and the frequency with which it alters epistasis (Supplementary Fig. S11).

      These revisions strengthen both the rigor and the clarity of the manuscript. We hope they address the reviewer’s concerns and make the significance of our findings, particularly the siteresolved quantification of higher-order epistasis in the folA landscape, including in synonymous mutations, more apparent.

      Reviewing Editor Comments: 

      Key revision suggestions: 

      (1) Please quantify the impact of measurement noise on your conclusions, and perform statistical analysis to determine whether the observed differences of epistasis due to different backgrounds are statistically significant. 

      (2) Please investigate how your conclusions depend on the cutoffs, and consider choosing them based on statistical criteria. 

      (3) Please reconsider the possible role of global epistasis. In particular, the effect of bounds on fitness values. All reviewers are concerned that all claims, including about global epistasis, may be consistent with a simple null model where most low fitness genotypes are non-functional and variation in their fitness is simply driven by measurement noise. Please provide a convincing argument rejecting this model. 

      More generally, we recommend that you consider all suggestions by reviewers, including those about results, but also those about terminology and citing relevant works. 

      Thank you for your guidance. We have substantially revised the manuscript to incorporate the reviewers’ suggestions. In addition to addressing the three central issues raised, we have refined terminology, expanded the discussion of prior work, and clarified the presentation of our main results. We believe these changes significantly strengthen both the rigor and the impact of the study. We are grateful to the Reviewing Editor and reviewers for their constructive feedback.

      In the revised manuscript, we address the three major points as follows:

      (1) Quantifying measurement noise and statistical significance. We now use the average of six independent experimental runs for each genotype, together with the corresponding standard deviations, to explicitly quantify measurement uncertainty. Pairwise and higher-order epistasis are assessed relative to these error estimates, rather than against fixed thresholds. This ensures that differences across genetic backgrounds are statistically distinguishable from noise.

      (2) Replacing arbitrary cutoffs with statistical criteria. We have eliminated the use of arbitrary thresholds. Instead, classification of interactions (positive, negative, or neutral epistasis) is based on whether fitness differences exceed replicate variance. This approach scales naturally with measurement precision. While some results change quantitatively for high-fitness backgrounds and qualitatively for low-fitness backgrounds, our central conclusions remain robust.

      (3) Analysis of synonymous mutations. We now separately analyze synonymous mutations to test their role in altering epistasis. Our results show that there is no correlation between the average fitness effect of a synonymous mutation and the frequency with which it changes epistatic interactions.

      We have revised terminology for clarity (replacing “fluid” with higher-order epistasis) and updated the Discussion to place our work in the broader context of the literature on higher-order epistasis.

      Finally, we have rewritten the entire manuscript to improve clarity, refine the narrative flow, and ensure that the presentation more crisply reflects the subject of the study

      Reviewer #1 (Recommendations for the authors): 

      MINOR COMMENTS 

      (1) Lines 102-107. Papkou's definition of non-functional genotypes makes sense since it is based on the fact that some genotypes are statistically indistinguishable in terms of fitness from mutants with premature stop codons in folA. It doesn't really matter whether to call them low fitness or non-functional, but it would be helpful to explain the basis for this distinction. 

      Thank you for raising this point. To maintain consistency with the original dataset and analysis, we retain Papkou et al.’s nomenclature and refer to these genotypes as “functional” or “non-functional.” 

      (2) Lines 111-112. I think the authors need to briefly explain here how they define the absence of epistasis. They do so in the Methods, but this information is essential and needs to be conveyed to the reader in the Results as well. 

      Thank you for the suggestion. We agree that this definition is essential for readers to follow the Results. In the revised manuscript, we have added a brief explanation at the start of the Results section clarifying how we define the absence of epistasis. Specifically, we now state that two mutations are considered non-epistatic when the observed fitness of the double mutant is statistically indistinguishable (within error of six replicates) from the additive expectation based on the single mutants. This ensures that the Results section is selfcontained, while full details remain in the Methods.

      (3) Lines 142 and elsewhere. The authors introduce the qualifier "fluid" to describe the fact that the value or sign of pairwise epistasis changes across genetic backgrounds. I don't see a need for this new terminology, since it is already captured adequately by the term "higher-order epistasis". The epistasis field is already rife with jargon, and I would prefer if new terms were introduced only when absolutely necessary. 

      Thank you for this helpful suggestion. We agree that introducing new terminology is unnecessary here. In the revised manuscript, we have replaced the term “fluid” epistasis with “higher-order epistasis” throughout, to align with established usage and avoid adding jargon.

      (4) Figure 6. I don't think this is the best way of showing that the pivot points are clustered. A histogram would be more appropriate and would take less space. However it would allow the authors to display a null distribution to demonstrate that this clustering is indeed surprising. 

      (5) Lines 320-321. Mann-Whitney U tests whether one distribution is systematically shifted up or down relative to the other. Please change the language here. It looks like the authors also performed the Kolmogorov-Smirnoff test, which is appropriate, but it doesn't look like the results are reported anywhere. Please report. 

      (6) Lines 330-334. The fact that HF genotypes seem to have more similar DFEs than LF genotypes is somewhat counterintuitive. Could this be an artifact of the fact that any two random HF genotypes are more similar to each other than any two randomly sampled LF genotypes? 

      (7) Lines 427. The sentence "The set of these selected variants are assigned their one hamming distance neighbours to construct a new 𝑛-base sequence space" is confusing. I think it is pretty clear how to construct a n-base sequence space, and this sentence adds more confusion than it removes. 

      Thank you for raising this point. To maintain consistency with the original dataset and analysis, we retain Papkou et al.’s nomenclature and refer to these genotypes as “functional” or “non-functional.” 

      We now start the results section of the manuscript with a brief description of how each type of epistasis is defined. Specifically, we now state that two mutations are considered non-epistatic when the observed fitness of the double mutant is statistically indistinguishable (within the error of six replicates) from the additive expectation based on the single mutants. This ensures that the Results section is self-contained, while full details remain in the Methods.

      We also agree that introducing new terminology is unnecessary. In the revised manuscript, we have replaced the term “fluid” epistasis with “higher-order epistasis” throughout, to align with established usage and avoid adding jargon. Finally, we concur that the identified sentence was unnecessary and potentially confusing; it has been removed from the revised manuscript to improve clarity. In fact, we have rewritten the entire manuscript for better flow and readability. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Supplementary Figure S2A and S3 seem to be the same. 

      (3) The classification scheme for reciprocal sign/single sign/other sign epistasis differs from convention and should be made more explicit or renamed. 

      (4) Re the claim that high and low fitness backgrounds have different frequencies of the various types of epistasis: 

      Are the frequency distributions of the different types of epistasis statistically different between high and low fitness backgrounds statistically significant? It seems that they follow similar general patterns, and the sample size is much smaller for high fitness backgrounds so more variance in their distributions is expected. 

      Do bounding of fitness measurements play a role in generating the differences in types of epistasis seen in high vs. low-fitness backgrounds? If many variants are at the lower bound of the fitness assay, then positive epistasis might simply be less detectable for these backgrounds (which seems to be the biggest difference between high/low fitness backgrounds). 

      (5) In Figure 4B, points are not independent, because the mutation effects are calculated for all mutations in all backgrounds, rather than with reference to a single background or fluorescence value. The same mutations are therefore counted many times. 

      (6) It is not clear how the "pivot growth rate" was calculated or what the importance of this metric is. 

      (7) In the introduction, the justification for reanalyzing the Papkou et al dataset in particular is not clear. 

      (8) Epistasis at the nucleotide level is expected because of the genetic code: fitness and function are primarily affected by amino acid changes, and nucleotide mutations will affect amino acids depending on the state at other nucleotide sites in the same codon. For the most part, this is not explicitly taken account of in the paper. I recommend separating apparent epistasis due to the genetic code from that attributable to dependence among codons. 

      Thank you for noting this. Figure S2A shows results for high-fitness peaks only, whereas Figure S3 shows results for all peaks across the landscape. We have now made this distinction explicit in the figure legends and main text of the revised manuscript. 

      In the revised analysis, peaks are defined using the average fitness across six experimental replicates along with the corresponding standard deviation. Each genotype is compared with all single-step neighbors, and it is classified as a peak only if its mean fitness is significantly higher than all neighbors (p < 0.05). This procedure explicitly accounts for measurement error and replaces the arbitrary thresholding used previously. Full details are now described in the Methods.

      To avoid confusion, we now state our definitions explicitly at the start of the analysis. We have now corrected our definition in the text. We define sign epistasis as a one where at least one mutation switches from being beneficial to deleterious. 

      We have clarified our motivation in the Introduction. The Papkou et al. dataset is the most comprehensive experimental map of a complete 9-bp region of folA and provides six independent replicates, making it uniquely suited for testing hypotheses about backgrounddependent epistasis. Importantly, Papkou et al. based their conclusions on a single run, whereas our reanalysis incorporates replicate means and variances, leading to substantive differences—for example, a reduction in reported peaks from 514 to 127. By recalibrating the analysis, we provide a more rigorous account of this landscape and highlight how methodological choices affect conclusions.

      We also agree that some nucleotide-level epistasis reflects the structure of the genetic code (i.e., codon degeneracy and context-dependence of amino acid substitutions). In the revised manuscript, we explicitly separate epistasis attributable to codon structure from epistasis arising among codons. For example, synonymous mutations that alter epistasis within codons are treated separately from those affecting interactions across codons, and this distinction is now clearly indicated in the Results.

      Reviewer #3 (Recommendations for the authors): 

      (1) The analysis of peak density and accessibility in the paragraph starting on line 96 seems a bit out of context. Its connection with the various forms of epistasis treated in the rest of the paper is unclear. 

      (2) As mentioned in the Public Review, the term 'sign epistasis' has been used in a non-standard way. My suggestion would be to use a different term. Even a slightly modified term, such as "strong sign epistasis", should help to avoid any confusion. 

      (3)  mentioned in the public review that it is not clear whether the synonymous mutations that change the type of epistasis also tend to be non-neutral. This issue could be addressed by computing, for example, the fitness effects of all synonymous mutations for backgrounds and mutation pairs where a switch in epistasis occurs, and comparing it with fitness effects where no such switch occurs. 

      (4) Do the authors have any proposal for why synonymous mutations seem to cause more frequent changes in epistasis in low-fitness backgrounds? Related to this, is there any systematic difference between the types of switch caused by synonymous mutations in the low- versus high-fitness backgrounds? 

      (5) It is unclear exactly how the pivot points were determined, especially since the data for many mutations is noisy. The protocol should be provided in the Methods section. 

      (6) Line 303: possible typo, "accurate" --> "inaccurate". 

      (7) The value of Delta used for the "phenotypic DFE" has not been mentioned in the main text (including Methods).

      We agree that the connection needed to be clearer. In the revised manuscript, we (i) relocate and retitle this material as a brief “Landscape overview” preceding the epistasis analyses, (ii) explicitly link multi-peakedness and path accessibility to epistasis (e.g., multi-peak structure implies the presence of sign/reciprocal-sign epistasis; accessibility is shaped by background-dependent effects), and (iii) move derivations to the Supplement. We also recomputed peak density and accessibility using replicate-averaged fitness with replicate SDs, so the overview and downstream epistasis sections now use a single, error-aware landscape (updated in Figs. 1–3, with cross-references in the text).

      We have aligned our terminology and now state definitions upfront. 

      After replacing fixed cutoffs with replicate-based error criteria, switches are more frequent in high-fitness backgrounds (Fig. 3). Mechanistically, near the lower fitness bound, deleterious effects are masked (global nonlinearity), reducing apparent switching. Functional/high-fitness backgrounds allow both beneficial and deleterious outcomes, so background-dependent (higher-order) interactions manifest more readily. Switch types also vary by background fitness: high-fitness backgrounds show more sign/strong-sign switches, whereas low-fitness backgrounds show mostly magnitude reclassifications (Fig. 3C; Supplement Fig. Sx).

      Finally, we corrected a typo by replacing “accurate” with “inaccurate” and now define Δ (equal to 0.05) in the main text (in Results and Figure 8 caption).

    1. eLife Assessment

      Computational simulation of neuron function depends on a collection of morphological properties and ion channel biophysics. This manuscript introduces DendroTweaks, a valuable web application and Python library that eases interactive exploration, development, and validation of single-neuron models in an easily installable and well-documented package. The authors provide a convincing demonstration that their software aids with building intuition and rapid prototyping of biophysical models of neurons, which improves the accessibility of dendritic simulation.

    2. Reviewer #1 (Public review):

      Summary:

      Dendrotweaks provides to its users a solid tool to implement, visualize, tune, validate, understand, and reduce single-neuron models that incorporate complex dendritic arbors with differential distribution of biophysical mechanisms. The visualization of dendritic segments and biophysical mechanisms therein provide users an intuitive way to understand and appreciate dendritic physiology.

    3. Reviewer #2 (Public review):

      The paper by Makarov et al. describes the software tool called DendroTweaks, intended for examination of multi-compartmental biophysically detailed neuron models. It offers extensive capabilities for working with very complex distributed biophysical neuronal models and should be a useful addition to the growing ecosystem of tools for neuronal modeling.

      Strengths

      • This Python-based tool allows for visualization of a neuronal model's compartments.

      • The tool works with morphology reconstructions in the widely used .swc and .asc formats.

      • It can support many neuronal models using the NMODL language, which is widely used for neuronal modeling.

      • It permits one to plot the properties of linear and non-linear conductances in every compartment of a neuronal model, facilitating examination of model's details.

      • DendroTweaks supports manipulation of the model parameters and morphological details, which is important for exploration of the relations of the model composition and parameters with its electrophysiological activity.

      • The paper is very well written - everything is clear, and the capabilities of the tool are described and illustrated with great attention to details.

      Weaknesses

      • Not a really big weakness, but it would be really helpful if the authors showed how the performance of their tool scales. This can be done for an increasing number of compartments - how long does it take to carry out typical procedures in DendroTweaks, on a given hardware, for a cell model with 100 compartments, 200, 300, and so on? This information will be quite useful to understand the applicability of the software.

      Let me also add here a few suggestions (not weaknesses, but something that can be useful, and if the authors can easily add some of these for publication, that would strongly increase the value of the paper).

      • It would be very helpful to add functionality to read major formats in the field, such as NeuroML and SONATA.

      • Visualization is available as a static 2D projection of the cell's morphology. It would be nice to implement 3D interactive visualization.

      • It is nice that DendroTweaks can modify the models, such as revising the radii of the morphological segments or ionic conductances. It would be really useful then to have the functionality for writing the resulting models into files for subsequent reuse.

      • If I didn't miss something, it seems that DendroTweaks supports allocation of groups of synapses, where all synapses in a group receive the same type of Poisson spike train. It would be very useful to provide more flexibility. One option is to leverage the SONATA format, which has ample functionality for specifying such diverse inputs.

      • "Each session can be saved as a .json file and reuploaded when needed" - do these files contain the whole history of the session or the exact snapshot of what is visualized when the file is saved? If the latter, which variables are saved, and which are not? Please clarify.

      Comments on revisions:

      In this revised version of the paper, the authors addressed all my comments. While many of the suggestions were addressed by textual changes in the manuscript or an explanation in the response to the reviewers (rather than adding substantial new functionality to the tool), DendroTweaks in its current updated state does represent an advanced and useful tool. Further extensions can be added as the development of the tool continues, in interaction with the community.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Dendrotweaks provides its users with a solid tool to implement, visualize, tune, validate, understand, and reduce single-neuron models that incorporate complex dendritic arbors with differential distribution of biophysical mechanisms. The visualization of dendritic segments and biophysical mechanisms therein provide users with an intuitive way to understand and appreciate dendritic physiology.

      Strengths:

      (1) The visualization tools are simplified, elegant, and intuitive.

      (2) The ability to build single-neuron models using simple and intuitive interfaces.

      (3) The ability to validate models with different measurements.

      (4) The ability to systematically and progressively reduce morphologically-realistic neuronal models.

      Weaknesses:

      (1) Inability to account for neuron-to-neuron variability in structural, biophysical, and physiological properties in the model-building and validation processes.

      We agree with the reviewer that it is important to account for neuron-to-neuron variability. The core approach of DendroTweaks, and its strongest aspect, is the interactive exploration of how morpho-electric parameters affect neuronal activity. In light of this, variability can be achieved through the interactive updating of the model parameters with widgets. In a sense, by adjusting a widget (e.g., channel distribution or kinetics), a user ends up with a new instance of a cell in the parameter space and receives almost real-time feedback on how this change affected neuronal activity. This approach is much simpler than implementing complex optimization protocols for different parameter sets, which would detract from the interactivity aspect of the GUI. In its revised version, DendroTweaks also accounts for neuron-to-neuron morphological variability, as channel distributions are now based on morphological domains (rather than the previous segment-specific approach). This makes it possible to apply the same biophysical configuration across various morphologies. Overall, both biophysical and morphological variability can be explored within DendroTweaks. 

      (2) Inability to account for the many-to-many mapping between ion channels and physiological outcomes. Reliance on hand-tuning provides a single biased model that does not respect pronounced neuron-to-neuron variability observed in electrophysiological measurements.

      We acknowledge the challenge of accounting for degeneracy in the relation between ion channels and physiological outcomes and the importance of capturing neuron-to-neuron variability. One possible way to address this, as we mention in the Discussion, is to integrate automated parameter optimization algorithms alongside the existing interactive hand-tuning with widgets. In its revised version, DendroTweaks can integrate with Jaxley (Deistler et al., 2024) in addition to NEURON. The models created in DendroTweaks can now be run with Jaxley (although not all types of models, see the limitations in the Discussion), and their parameters can be optimized via automated and fast gradient-based parameter optimization, including optimization of heterogeneous channel distributions. In particular, a key advantage of integrating Jaxley with DendroTweaks was its NMODL-to-Python converter, which significantly reduced the need to manually re-implement existing ion channel models for Jaxley (see here: https://dendrotweaks.readthedocs.io/en/latest/tutorials/convert_to_jaxley.html).

      (1) Michael Deistler, Kyra L. Kadhim, Matthijs Pals, Jonas Beck, Ziwei Huang, Manuel Gloeckler, Janne K. Lappalainen, Cornelius Schröder, Philipp Berens, Pedro J. Gonçalves, Jakob H. Macke Differentiable simulation enables large-scale training of detailed biophysical models of neural dynamics bioRxiv 2024.08.21.608979; doi:https://doi.org/10.1101/2024.08.21.608979

      Lack of a demonstration on how to connect reduced models into a network within the toolbox.

      Building a network of reduced models is an exciting direction, yet beyond the scope of this manuscript, whose primary goal is to introduce DendroTweaks and highlight its capabilities. DendroTweaks is designed for single-cell modeling, aiming to cover its various aspects in great detail. Of course, we expect refined single-cell models, both detailed and simplified, to be further integrated into networks. But this does not need to occur within DendroTweaks. We believe this network-building step is best handled by dedicated network simulation platforms. To facilitate the network-building process, we extended the exporting capabilities of DendroTweaks. To enable the export of reduced models in DendroTweaks’s modular format, as well as in plain simulator code, we implemented a method to fit the resulting parameter distributions to analytical functions (e.g., polynomials). This approach provided a compact representation, requiring a few coefficients to be stored in order to reproduce a distribution, independently of the original segmentation. The reduced morphologies can be exported as SWC files, standardized ion channel models as MOD files, and channel distributions as JSON files. Moreover, plain NEURON code (Python) to instantiate a cell class can be automatically generated for any model, including the reduced ones. Finally, to demonstrate how these exported models can be integrated into larger simulations, we implemented a "toy" network model in a Jupyter notebook included as an example in the GitHub repository. We believe that these changes greatly facilitate the integration of DendroTweaks-produced models into networks while also allowing users to run these networks on their favorite platforms.

      (4) Lack of a set of tutorials, which is common across many "Tools and Resources" papers, that would be helpful in users getting acquainted with the toolbox.

      This is an important point that we believe has been addressed fully in the revised version of the tool and manuscript. As previously mentioned, the lack of documentation was due to the software's early stage. We have now added comprehensive documentation, which is available at https://dendrotweaks.readthedocs.io. This extensive material includes API references, 12 tutorials, 4 interactive Jupyter notebooks, and a series of video tutorials, and it is regularly updated with new content. Moreover, the toolbox's GUI with example models is available through our online platform at https://dendrotweaks.dendrites.gr.  

      Reviewer #2 (Public review):

      The paper by Makarov et al. describes the software tool called DendroTweaks, intended for the examination of multi-compartmental biophysically detailed neuron models. It offers extensive capabilities for working with very complex distributed biophysical neuronal models and should be a useful addition to the growing ecosystem of tools for neuronal modeling.

      Strengths

      (1) This Python-based tool allows for visualization of a neuronal model's compartments.

      (2) The tool works with morphology reconstructions in the widely used .swc and .asc formats.

      (3) It can support many neuronal models using the NMODL language, which is widely used for neuronal modeling.

      (4) It permits one to plot the properties of linear and non-linear conductances in every compartment of a neuronal model, facilitating examination of the model's details.

      (5) DendroTweaks supports manipulation of the model parameters and morphological details, which is important for the exploration of the relations of the model composition and parameters with its electrophysiological activity.

      (6) The paper is very well written - everything is clear, and the capabilities of the tool are described and illustrated with great attention to detail.

      Weaknesses

      (1) Not a really big weakness, but it would be really helpful if the authors showed how the performance of their tool scales. This can be done for an increasing number of compartments - how long does it take to carry out typical procedures in DendroTweaks, on a given hardware, for a cell model with 100 compartments, 200, 300, and so on? This information will be quite useful to understand the applicability of the software.

      DendroTweaks functions as a layer on top of a simulator. As a result, its performance scales in the same way as for a given simulator. The GUI currently displays the time taken to run a simulation (e.g., in NEURON) at the bottom of the Simulation tab in the left menu. While Bokeh-related processing and rendering also consume time, this is not as straightforward to measure. It is worth noting, however, that this time is short and approximately equivalent to rendering the corresponding plots elsewhere (e.g., in a Jupyter notebook), and thus adds negligible overhead to the total simulation time. 

      (2) Let me also add here a few suggestions (not weaknesses, but something that can be useful, and if the authors can easily add some of these for publication, that would strongly increase the value of the paper).

      (3) It would be very helpful to add functionality to read major formats in the field, such as NeuroML and SONATA.

      We agree with the reviewer that support for major formats will substantially improve the toolbox, ensuring the reproducibility and reusability of the models. While integration with these formats has not been fully implemented, we have taken several steps to ensure elegant and reproducible model representation. Specifically, we have increased the modularity of model components and developed a custom compact data format tailored to single-cell modeling needs. We used a JSON representation inspired by the Allen Cell Types Database schema, modified to account for non-constant distributions of the model parameters. We have transitioned from a representation of parameter distributions dependent on specific segmentation graphs and sections to a more generalized domain-based distribution approach. In this revised methodology, segment groups are no longer explicitly defined by segment identifiers, but rather by specification of anatomical domains and conditional expressions (e.g., “select all segments in the apical domain with the maximum diameter < 0.8 µm”). Additionally, we have implemented the export of experimental protocols into CSV and JSON files, where the JSON files contain information about the stimuli (e.g., synaptic conductance, time constants), and the CSV files store locations of recording sites and stimuli. These features contribute toward a higher-level, structured representation of models, which we view as an important step toward eventual compatibility with standard formats such as NeuroML and SONATA. We have also initiated a two-way integration between DendroTweaks and SONATA. We developed a converter from DendroTweaks to SONATA that automatically generates SONATA files to reproduce models created in DendroTweaks. Additionally, support for the DendroTweaks JSON representation of biophysical properties will be added to the SONATA data format ecosystem, enabling models with complex dendritic distributions of channels. This integration is still in progress and will be included in the next version of DendroTweaks. While full integration with these formats is a goal for future releases, we believe the current enhancements to modularity and exportability represent a significant step forward, providing immediate value to the community.

      (4) Visualization is available as a static 2D projection of the cell's morphology. It would be nice to implement 3D interactive visualization.

      We offer an option to rotate a cell around the Y axis using a slider under the plot. This is a workaround, as implementing a true 3D visualization in Bokeh would require custom Bokeh elements, along with external JavaScript libraries. It's worth noting that there are already specialized tools available for 3D morphology visualization. In light of this, while a 3D approach is technically feasible, we advocate for a different method. The core idea of DendroTweaks’ morphology exploration is that each section is “clickable”, allowing its geometric properties to be examined in a 2D "Section" view. Furthermore, we believe the "Graph" view presents the overall cell topology and distribution of channels and synapses more clearly.

      (5) It is nice that DendroTweaks can modify the models, such as revising the radii of the morphological segments or ionic conductances. It would be really useful then to have the functionality for writing the resulting models into files for subsequent reuse.

      This functionality is fully available in local installations. Users can export JSON files with channel distributions and SWC files after morphology reduction through the GUI. Please note that for resource management purposes, file import/export is disabled on the public online demo. However, it can be enabled upon local installation by modifying the configuration file (app/default_config.json). In addition, it is now possible to generate plain NEURON (Python) code to reproduce a given model outside the toolbox (e.g., for network simulations). Moreover, it is now possible to export the simulation protocols as CSV files for locations of stimuli and recordings and JSON files for stimuli parameters.

      (6) If I didn't miss something, it seems that DendroTweaks supports the allocation of groups of synapses, where all synapses in a group receive the same type of Poisson spike train. It would be very useful to provide more flexibility. One option is to leverage the SONATA format, which has ample functionality for specifying such diverse inputs.

      Currently, each population of “virtual” neurons that form synapses on the detailed cell shares the same set of parameters for both biophysical properties of synapses (e.g., reversal potential, time constants) and presynaptic "population" activity (e.g., rate, onset). The parameter that controls an incoming Poisson spike train is the rate, which is indeed shared across all synapses in a population. Unfortunately, the current implementation lacks the capability to simulate complex synaptic inputs with heterogeneous parameters across individual synapses or those following non-uniform statistical distributions (the present implementation is limited to random uniform distributions). We have added this information in the Discussion (3. Discussion - 3.2 Limitations and future directions - ¶.5) to make users aware of the limitations. As it requires a substantial amount of additional work, we plan to address such limitations in future versions of the toolbox.

      (7) "Each session can be saved as a .json file and reuploaded when needed" - do these files contain the whole history of the session or the exact snapshot of what is visualized when the file is saved? If the latter, which variables are saved, and which are not? Please clarify.

      In the previous implementation, these files captured the exact snapshot of the model's latest state. In the new version, we adopted a modular approach where the biophysical configuration (e.g., channel distributions) and stimulation protocols are exported to separate files. This allows the user to easily load and switch the stimulation protocols for a given model. In addition, the distribution of parameters (e.g., channel conductances) is now based on the morphological domains and is agnostic of the exact morphology (i.e., sections and segments), which allows the same JSON files with biophysical configurations to be reused across multiple similar morphologies. This also allows for easy file exchange between the GUI and the standalone version.

      Joint recommendations to Authors:

      The reviewers agreed that the paper is well written and that DendroTweaks offers a useful collection of tools to explore models of single-cell biophysics. However, the tooling as provided with this submission has critical limitations in the capabilities, accessibility, and documentation that significantly limit the utility of DendroTweaks. While we recognize that it is under active development and features may have changed already, we can only evaluate the code and documentation available to us here.

      We thank the reviewers for their positive evaluation of the manuscript and express our sincere appreciation for their feedback. We acknowledge the limitations they have pointed out and have addressed most of these concerns in our revised version.

      In particular, we would emphasize:

      (1) While the features may be rich, the documentation for either a user of the graphical interface or the library is extremely sparse. A collection of specific tutorials walking a GUI user through simple and complex model examples would be vital for genuine uptake. As one category of the intended user is likely to be new to computational modeling, it would be particularly good if this documentation could also highlight known issues that can arise from the naive use of computational techniques. Similarly, the library aspect needs to be documented in a more standard manner, with docstrings, an API function list, and more didactic tutorials for standard use cases.

      DendroTweaks now features comprehensive documentation. The standalone Python library code is well-documented with thorough docstrings. The overall code modularity and readability have improved. The documentation is created using the widely adopted Sphinx generator, making it accessible for external contributors, and it is available via ReadTheDocs https://dendrotweaks.readthedocs.io/en/latest/index.html. The documentation provides a comprehensive set of tutorials (6 basic, 6 advanced) covering all key concepts and workflows offered by the toolbox. Interactive Jupyter notebooks are included in the documentation, along with the quick start guide. All example models also have corresponding notebooks that allow users to build the model from scratch.

      The toolbox has its own online platform, where a quick-start guide for the GUI is available https://dendrotweaks.dendrites.gr/guide.html. We have created video tutorials for the GUI covering the basic use cases. Additionally, we have added tips and instructions alongside widgets in the GUI, as well as a status panel that displays application status, warnings, and other information. Finally, we plan to familiarize the community with the toolbox by organizing online and in-person tutorials, as the one recently held at the CNS*2025 conference (https://cns2025florence.sched.com/event/25kVa/building-intuitive-and-efficient-biophysicalmodels-with-jaxley-and-dendrotweaks). Moreover, the toolbox was already successfully used for training young researchers during the Taiwan NeuroAI 2025 Summer School, founded by Ching-Lung Hsu. The feedback was very positive.

      (2) The paper describes both a GUI web app and a Python library. However, the code currently mixes these two in a way that largely makes sense for the web app but makes it very difficult to use the library aspect. Refactoring the code to separate apps and libraries would be important for anyone to use the library as well as allowing others to host their own DendroTweak servers. Please see the notes from the reviewing editor below for more details.

      The code in the previous `app/model` folder, responsible for the core functionality of the toolbox, has been extensively refactored and extended, and separated into a standalone library. The library is included in the Python package index (PyPI, https://pypi.org/project/dendrotweaks).

      Notes from the Reviewing Editor Comments (Recommendations for the authors):

      (1) While one could import morphologies and use a collection of ion channel models, details of synapse groups and stimulation approaches appeared to be only configurable manually in the GUI. The ability to save and load full neuron and simulation states would be extremely useful for reproducibility and sharing data with collaborators or as an interactive data product with a publication. There is a line in the text about saving states as json files (also mentioned by Reviewer #2), but I could see no such feature in the version currently online.

      We decided to reserve the online version for demonstration and educational purposes, with more example models being added over time. However, this functionality is available upon local installation of the app (and after specifying it in the ‘default_config.json’ in the root directory of the app). We’ve adopted a modular model representation to store separately morphology, channel models, biophysical parameters, and stimulation protocols.

      (2) Relatedly, GUI exploration of complex data is often a precursor to a more automated simulation run. An easy mechanism to go from a user configuration to scripting would be useful to allow the early strength of GUIs to feed into the power of large-scale scripting.

      Any model could be easily exported to a modular DendroTweaks representation and later imported either in the GUI or in the standalone version programmatically. This ensures a seamless transition between the two use cases.

      (3) While the paper discusses DendroTweaks as both a GUI and a python library, the zip file of code in the submission is not in good form as a library. Back-end library code is intermingled with front-end web app code, which limits the ability to install the library from a standard python interface like PyPI. API documentation is also lacking. Functions tend to not have docstrings, and the few that do, do not follow typical patterns describing parameters and types.

      As stated above, all these issues have been resolved in the new version of the toolbox. The library code is now housed in a separate repository https://github.com/Poirazi-Lab/DendroTweaks and included in PyPI https://pypi.org/project/dendrotweaks. The classes and public methods follow Numpy-style docstrings, and the API reference is available in the documentation: https://dendrotweaks.readthedocs.io/en/latest/genindex.html.

      (4) Library installation is very difficult. The requirements are currently a lockfile, fully specifying exact versions of all dependencies. This is exactly correct for web app deployment to maintain consistency, but is not feasible in the context of libraries where you want to have minimal impact on a user's environment. Refactoring the library from the web app is critical for making DendroTweaks usable in both forms described in the paper.

      The lockfile makes installation more or less impossible on computer setups other than that of the author. Needless to say, this is not acceptable for a tool, and I would encourage the authors to ask other people to attempt to install their code as they describe in the text. For example, attempting to create a conda environment from the environment.yml file on an M1 MacBook Pro failed because it could not find several requirements. I was able to get it to install within a Linux docker image with the x86 platform specified, but this is not generally viable. To make this be the tool it is described as in text, this must be resolved. A common pattern that would work well here is to have a requirements lockfile and Docker image for the web app that imports a separate, more minimally restrictive library package with that could be hosted on PyPI or, less conveniently, through conda-forge.

      The installation of the standalone library is now straightforward via pip install dendrotweaks.On the Windows platform, however, manual installation of NEURON is required as described          in the official NEURON documentation https://nrn.readthedocs.io/en/8.2.6/install/install_instructions.html#windows.

      (5) As an aside, to improve potential uptake, the authors might consider an MIT-style license rather than the GNU Public License unless they feel strongly about the GPL. Many organizations are hesitant to build on GPL software because of the wide-ranging demands it places on software derived from or using GPL code.

      We thank the editor for this suggestion. We are considering changing the licence to MPL 2.0. It will maintain copyleft restrictions only on the package files while allowing end-users to freely choose their own license for any derived work, including the models, generated data files, and code that simply imports and uses our package.

      Reviewer #1 (Recommendations for the authors):

      (1) Abstract: Neurons rely on the interplay between dendritic morphology and ion channels to transform synaptic inputs into a sequence of somatic spikes. Technically, this would have to be morphology, ion channels, pumps, transporters, exchangers, buffers, calcium stores, and other molecules. For instance, if the calcium buffer concentration is large, then there would be less free calcium for activating the calcium-activated potassium channels. If there are different chloride co-transporters - NKCC vs. KCC - expressed in the neuron or different parts of the neuron, that would alter the chloride reversal for all the voltage- or ligand-gated chloride channels in the neuron. So, while morphology and ion channels are two important parts of the transformation, it would be incorrect to ignore the other components that contribute to the transformation. The statement might be revised to make these two components as two critical components.

      The phrase “Two critical components” was added as it was suggested by the reviewer.

      (2) Section 2.1 - The overall GUI looks intuitive and simple.

      (3) Section 2.2

      (a) The Graph view of morphology, especially accounting for the specific d_lambda is useful.

      (b) "Note that while microgeometry might not significantly affect the simulation at a low spatial resolution (small number of segments) due to averaging, it can introduce unexpected cell behavior at a higher level of spatial discretization."

      It might be good to warn the users that the compartmentalization and error analyses are with reference to the electrical lambda. If users have to account for calcium microdomains, these analyses wouldn't hold given the 2 orders of magnitude differences between the electrical and the calcium lambdas (e.g., Zador and Koch, J Neuroscience, 1994). Please sensitize users that the impact of active dendrites in regulating calcium microdomains and signaling is critical when it comes to plasticity models in morphologically realistic structures.

      We thank the reviewer for this important point. We have clarified in the text that our spatial discretization specifically refers to the electrical length constant. We acknowledge that electrical and chemical processes operate on fundamentally different spatial and temporal scales, which requires special consideration when modeling phenomena like synaptic plasticity. We have sensitized users about this distinction. However, we do not address such examples in the manuscript, thus leaving the detailed discussion of non-electrical compartmentalization beyond the scope of this work.

      (c) I am not very sure if the "smooth" tool for diameters that is illustrated is useful. Users shouldn't consider real variability in morphology as artifacts of reconstruction. As mentioned above, while this might not be an issue with electrical compartmentalization, calcium compartmentalization will severely be affected by small changes in morphology. Any model that incorporates calcium-gated channels should appropriately compartmentalize calcium. Without this, the spread of activation of calcium-dependent conductances would be an overestimate. Even small changes in cellular shape and curvature can have large impacts when it comes to signaling in terms of protein aggregation and clustering.

      Although this functionality is still available in the toolbox, we have removed the emphasis from it in the manuscript. Nevertheless, for the purpose of addressing the reviewer’s comment, we provide an example when this “smoothening” might be needed:please see Figure S1 from Tasciotti et al. 2025.

      (2) Simone Tasciotti, Daniel Maxim Iascone, Spyridon Chavlis, Luke Hammond, Yardena Katz, Attila Losonczy, Franck Polleux, Panayiota Poirazi. From Morphology to Computation: How Synaptic Organization Shapes Place Fields in CA1 Pyramidal Neurons bioRxiv 2025.05.30.657022; doi: https://doi.org/10.1101/2025.05.30.657022

      (4) Section 2.3

      (a) The graphical representation of channel gating kinetics is very useful.

      (b) Please warn the users that experimental measurements of channel gating kinetics are extremely variable. Taking the average of the sigmoids or the activation/deactivation/inactivation kinetics provides an illusion that each channel subtype in a given cell type has fixed values of V_1/2, k, delta, and tau, but it is really a range obtained from several experiments. The heterogeneity is real and reflects cell-to-cell variability in channel gating kinetics, not experimental artifacts. Please sensitize the readers that there is not a single value for these channel parameters.

      This is a fair comment, and it refers to a general problem in neuronal modeling. In DendroTweaks, we follow the approach widely used in the community that indeed doesn't account for heterogeneity. We added a paragraph in the revised manuscript's Discussion (3. Discussion - 3.3 Limitations and future directions - ¶.3) to address this issue.

      (5) Section 2.4

      (a) Same as above: Please sensitize users that the gradients in channel conductances are measured as an average of measurements from several different cells. This gradient need not be present in each neuron, as there could be variability in location-dependent measurements across cells. The average following a sigmoid doesn't necessarily mean that each neuron will have the channel distributed with that specific sigmoid (or even a sigmoid!) with the specific parametric values that the average reported. This is extremely important because there is an illusion that the gradient is fixed across cells and follows a fixed functional form.

      We added this information to our Discussion in the same paragraph mentioned above.

      (b) Please provide an example where the half-maximal voltage of a channel varies as a function of distance (such as Poolos et al., Nature Neuroscience, 2002 or Migliore et al., 1999; Colbert and Johnston, 1997). This might require a step-like function in some scenarios. An illustration would be appropriate because people tend to assume that channel gating kinetics are similar throughout the dendrite. Again, please mention that these shifts are gleaned from the average and don't really imply that each neuron must have that specific gradient, given neuron-to-neuron variability in these measurements.

      We thank the reviewer for the provided literature, which we now cite when describing parameter distributions (2. Results - 2.4 Distributing ion channels - ¶.1). Please note that DendroTweaks' programming interface and data format natively support non-linear distribution of kinetic parameters alongside the channel conductances. As for the step-like function, users can either directly apply the built-in step-like distribution function or create it by combining two constant distributions.

      (6) Section 2.5

      (a) It might be useful to provide a mechanism for implementing the normalization of unitary conductances at the cell body, (as in Magee and Cook, 2000; Andrasfalvy et al., J Neuroscience, 2001). Specifically, users should be able to compute AMPAR conductance values at each segment which would provide a somatic EPSP value of 0.2 mV.

      This functionality is indeed useful and will be added in future releases. Currently, it has been mentioned in the list of known limitations when working with synaptic inputs (3. Discussion - 3.3 Limitations and future directions - ¶.5).

      (b) Users could be sensitized about differences in decay time constants of GABA_A receptors that are associated with parvalbamin vs. somatostatin neurons. As these have been linked to slow and fast gamma oscillations and different somatodendritic locations along different cell types, this might be useful (e.g., 10.1016/j.neuron.2017.11.033;10.1523/jneurosci.0261-20.2020; 10.7554/eLife.95562.1; 10.3389/fncel.2023.1146278).

      We thank the reviewer for highlighting this important biological detail. DendroTweaks enables users to define model parameters specific to their cell type of interest. For practical reasons, we leave the selection of biologically relevant parameters to the users. However, we will consider adding an explicit example in our tutorials to showcase the toolbox's flexibility in this regard.

      (7) Section 2.6

      While reducing the morphological complexity has its advantages, users of this tool should be sensitized in this section about how the reduction does not capture all the complexity of the dendritic computation. For instance, the segregation/amplification properties of Polsky et al., 2004, Larkum et al., 2009 would not be captured by a fully reduced model. An example across different levels of reductions, implementing simulations in Figure 7F (but for synapses on the same vs. different branches), would be ideal. Demonstrate segregation/amplification in the full model for the same set of synapses - coming on the same branch/different branch (linear integration of synapses on different branches and nonlinear integration of synapses on the same branch). Then, show that with different levels of reduction, this segregation/amplification vanishes in the reduced model. In addition, while impedance-based approaches account for account for electrical computation, calcium-based computation is not something that is accountable with reduced models, given the small lambda_calcium values. Given the importance of calcium-activated conductances in electrical behaviour, this becomes extremely important to account for and sensitize users to. The lack of such sensitization results in presumptuous reductions that assume that all dendritic computation is accounted for by reduced models!

      We agree with the reviewer that reduction leads to a loss in the complexity of dendritic computation. This has been stated in both the original algorithm paper (Amsalem et al., 2020) and in our manuscript (e.g., 3. Discussion - 3.2 Comparison to existing modeling software - ¶.6). In fact, to address this problem, we extended the functionality of neuron_reduce to allow for multiple levels of morphology reduction. Our motivation for integrating morphology reduction in the toolbox was to leverage the exploratory power of DendroTweaks to assess how different degrees of reduction alter cell integrative properties, determining which computations are preserved, which are lost, and at what specific reduction level these changes occur. Nevertheless, to address this comment, we've made it more explicit in the Discussion that reduction inevitably alters integrative properties and, at a certain level, leads to loss of dendritic computations.

      (8) Section 2.7

      (a) The validation process has two implicit assumptions:

      (i) There is only one value of physiological measurements that neurons and dendrites are endowed with. The heterogeneity in these measurements even within the same cell type is ignored. The users should be allowed to validate each measurement over a range rather than a single value. Users should be sensitized about the heterogeneity of physiological measurements.

      (ii) The validation process is largely akin to hand-tuning models where a one-to-one mapping of channels to measurements is assumed. For instance, input resistance can be altered by passive properties, by Ih, and by any channel that is active under resting conditions. Firing rate and patterns can be changed by pretty much every single ion channel that expresses along the somatodendritic axis.

      An updated validation process that respects physiological heterogeneities in measurements and accounts for global dependencies would be more appropriate. Please update these to account for heterogeneities and many-to-many mappings between channels and measurements. An ideal implementation would be to incorporate randomized search procedures (across channel parameters spanning neuron-to-neuron variability in channel conductances/gating properties) to find a population of models that satisfy all physiological constraints (including neuron-to-neuron variability in each physiological measurement), rather than reliance on procedures that are akin to hand-tuning models. Such population-based approaches are now common across morphologically-realistic models for different cell types (e.g., Rathour and Narayanan, PNAS, 2014; Basak and Narayanan, J Physiology, 2018; Migliore et al., PLoS Computational Biology, 2018; Basak and Narayanan, Brain Structure and Function, 2020; Roy and Narayanan, Neural Networks, 2021; Roy and Narayanan, J Physiology, 2023; Arnaudon et al., iScience, 2023; Reva et al., Patterns, 2023; Kumari and Narayanan, J Neurophysiology, 2024) and do away with the biases introduced by hand-tuning as well as the assumption of one-to-one mapping between channels and measurements.

      We appreciate the reviewer’s comment and the suggested alternatives to our validation process. We have extended the discussion on these alternative approaches (3. Discussion - 2. Comparison to existing modeling software - ¶.5). However, it is important to note that neither one-value nor one-to-one mapping assumption is imposed in our approach. It is true that validation is performed on a given model instance with fixed single-value parameters. However, users can discover heterogeneity and degeneracy in their models via interactive exploration. In the GUI, a given parameter can be changed, and the influence of this change on model output can be observed in real time. Validation can be run after each change to see whether the model output still falls within a biologically plausible regime or not. This is, of course, time-consuming and less efficient than any automated parameter optimization.

      However, and importantly, this is the niche of DendroTweaks. The approach we provide here can indeed be referred to as model hand-tuning. This is intentional: we aim to complement black-box optimization by exposing the relationship between parameters and model outputs. DendroTweaks is not aimed at automated parameter optimization and is not meant to provide the user with parameter ranges automatically. The built-in validation in DendroTweaks is intended as a lightweight, fast feedback tool to guide manual tuning of dendritic model parameters so as to enhance intuitive understanding and assess the plausibility of outputs, not as a substitute for comprehensive model validation or optimization. The latter can be done using existing frameworks, designed for this purpose, as mentioned by the reviewer. 

      (b) Users could be asked to wait for RMP to reach steady state. For instance, in some of the traces in Figure 7, the current injection is provided before RMP reaches steady-state. In the presence of slow channels (HCN or calcium-activated channels), the RMP can take a while to settle down. Users might be sensitized about this. This would also bring to attention the ability of several resting channels in modulating RMP, and the need to wait for steady-state before measurements are made.

      We agree with the observation and updated the validation process accordingly. We have added functionality for simulation stabilization, allowing users to pre-run a simulation before the main simulation time. For example, model.run(duration=1000, prerun_time=300) could be used to stabilize the model for a period of 300 ms before running the main simulation for 1 s.

      (c) Strictly speaking, it is incorrect to obtain membrane time constant by fitting a single exponential to the initial part of the sag response (Figure 7A). This may be confirmed in the model by setting HCN to zero (strictly all active channel conductances to zero), obtaining the voltage-response to a pulse current, fitting a double exponential (as Rall showed, for a finite cable or for a real neuron, a single exponential would yield incorrect values for the tau) to the voltage response, and mapping membrane time constant to the slower of the two time-constants (in the double exponential fit). This value will be very different from what is obtained in Figure 7A. Please correct this, with references to Rall's original papers and to electrophysiological papers that use this process to assess membrane properties of neurons and their dendrites (e.g., Stuart and Spruston, J Neurosci, 1998; Golding and Spruston, J Physiology, 2005).

      We updated the algorithm for calculating the membrane time constant based on the reviewer's suggestions and added the suggested references. The time constant is now obtained in a model with blocked HCN channels (setting maximal conductance to 0) via a double exponential fit, taking the slowest component.

      (9) Section 3

      (a) May be good to emphasize the many-to-many mapping between ion channels and neuronal functions here in detail, and on how to explore this within the Dendrotweaks framework.

      We have added a paragraph in the Discussion that addresses both the problems of heterogeneity and degeneracy in biological neurons and neuronal models (3. Discussion - 3.3 Limitations and future directions - ¶.3)

      (b) May be good to have a specific section either here or in results about how the different reduced models can actually be incorporated towards building a network.

      As mentioned earlier, building a network of reduced models is a promising new direction. However, it is beyond the scope of this manuscript, whose primary goal is to introduce DendroTweaks and highlight its capabilities. DendroTweaks is designed for single-cell modeling and provides export capabilities that allow integrating it into broader workflows, including network modeling. We have added a paragraph in the manuscript (3. Discussion - 3.1 Conceptual and implementational accessibility - ¶.2) that addresses how DendroTweaks could be used alongside other software, in particular for scaling up single-cell models to the network level.

      (10) Section 4

      (a) Section 4.3: In the second sentence (line 568), the "first Kirchhoff's law" within parentheses immediately after Q=CV gives an illusion that Q=CV is the first Kirchhoff's law! Please state that this is with reference to the algebraic sum of currents at a node.

      We have corrected the equations and apologize for this oversight. 

      (b) Table 1: In the presence of active ion channels, input resistance, membrane time constant, and voltage attenuation are not passive properties. Input resistance is affected by any active channel that is active at rest (HCN, Kir, A-type K+ through the window current, etc). The same holds for membrane time constant and voltage attenuation as well. This could be made clear by stating if these measurements are obtained in the presence or absence of active ion channels. In real neurons, all these measurements are affected by active ion channels; so, ideally, these are also active properties, not passive! Also, please mention that in the presence of resonating channels (e.g., HCN, M-type K+), a single exponential fit won't be appropriate to obtain tau, given the presence of sag.

      We thank the reviewer for pointing out this ambiguity. What the term “Passive” means in Table 1 (e.g., for the input resistance, R_in) is that the minimal set of parameters needed to validate R_in are the passive ones (i.e., Cm, Ra, and Leak). We have changed the table listing to reflect this.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2B and the caption to Figure 2F show and describe the diameter of the sections, whereas the image in Figure 2F shows the radius. Which is the correct one?

      The reason for this is that Figure 2B shows the sections' geometry as it is represented in NEURON, i.e., with diameters, while Figure 2F shows the geometry as it is represented in an SWC file (as these changes are made based on the SWC file). Nevertheless, as mentioned earlier, we decided to remove panel F from the figure in the new version, to present a more important panel on tree graph representations.

      (2) "Each segment can be viewed as an equivalent RC circuit representing a part of the membrane". The example in Figure 2B is perhaps a relatively simple case. For more complex cases where multiple nonlinear conductances are present in each section, would it be possible to show each of these conductances explicitly? If yes, it would be nice to illustrate that.

      We would like to clarify that "can be viewed" here was intended to mean "can be considered," and we have updated the text accordingly. The schematic RC circuits were added to the corresponding figure for illustration purposes only and are not present in the GUI, as this would indeed be impractical for multiple conductances.

      (3) Some extra citations could be added. For example, it is a little strange that BRIAN2 is mentioned, but NEST is not. It might be worth mentioning and citing it. Also, the Allen Cell Types Database is mentioned, but no citation for it is given. It could be useful to add such citations (https://doi.org/10.1038/s41593-019-0417-0, https://doi.org/10.1038/s41467-017-02718-3).

      Brian 2 is extensively used in our lab on its own and as a foundation of the Dendrify library (Pagkalos et al., 2023). As stated in the discussion, we are considering bridging reduced Hodgkin-Huxley-type models to Dendrify leaky integrate-and-fire type models. For these reasons, Brian 2 is mentioned in the discussion. However, we acknowledge that our previous overview omitted references to some key software, which have now been added to the updated manuscript. We appreciate the reviewer providing references that we had overlooked.

      (3) Pagkalos, M., Chavlis, S. & Poirazi, P. Introducing the Dendrify framework for incorporating dendrites to spiking neural networks. Nat Commun 14, 131 (2023). https://doi.org/10.1038/s41467-022-35747-8

    1. eLife Assessment

      This is an important study with convincing evidence that multi-voxel fMRI activity patterns for threat-conditioned stimuli are altered by learning CS-US contingencies. The analyses are dense, but rigorous. The protocol is quite nuanced and complex, but the authors have done a fair job of explaining and presenting the results. The work is relevant for our understanding of how effective learning changes neural stimulus representation in the human brain.

    2. Reviewer #1 (Public review):

      Summary:

      The authors conducted a human neuroimaging study investigating the role of context in the representation of fear associations when the contingencies between a conditioned stimulus and shock unconditioned stimulus switches between contexts. The novelty of the analysis centered on neural pattern similarity to derive a measure of context and cue stability and generalization across different regions of the brain. Given the complexity and nuance of the results, it is kind of difficult to provide a concise summary. But during fear and reversal, there was cue generalization (between current CS+ cues) in the canonical fear network, and "item stability" for cues that changed their association with the shock in the IFG and precuneus. Reinstatement was quantified as pattern similarity for items or sets of cues from the earlier phases to the test phases, and they found different patterns in the IFG and dmPFC. A similar analytical strategy was applied to contexts.

      Strengths:

      Overall, I found this to be a novel use of MVPA to study the role of context in reversal/extinction of human fear conditioning that yielded interesting results. The paper was overall well-written, with a strong introduction and fairly detailed methods and results. The lack of any univariate contrast results from the test phases was used as motivation for the neural pattern similarity approach, which I appreciated as a reader.

      I have no additional or new comments. The authors adequately addressed my major comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This is a timely and original study on the geometry of macroscopic (2.5 mm) brain representations of multiple cues and contexts in Pavlovian fear conditioning. The authors report that these representations differ between initial learning, and reversal learning, and remain stable during extinction.

      Strengths:

      The authors address an important question and use a rigorous experimental methodology.

      Weaknesses:

      The findings are limited by the chosen spatial resolution (2.5 mm) which is far away from what modern fMRI can achieve. Also, region-of-interesting findings should be considered exploratory due to the chosen FDR method for correction for multiple comparison (which is transparently reported).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comments:

      The study design used reversal learning (i.e. the CS+ becomes the CS- and vice versa), while the title mentions 'fear learning and extinction'. In my opinion, the paper does not provide insight into extinction and the title should be changed.

      Thank you for this important point. We agree that our paradigm focuses more directly on reversal learning than on standard extinction, as the test phases represent extinction in the absence of a US but follow a reversal phase. To better reflect the core of our investigation, we have changed the title.

      Proposed change in manuscript (Title): Original Title: Distinct representational properties of cues and contexts shape fear learning and extinction 

      New Title: Distinct representational properties of cues and contexts shape fear and reversal learning

      Secondly, the design uses 'trace conditioning', whereas the neuroscientific research and synaptic/memory models are rather based on 'delay conditioning'. However, given the limitations of this design, it would still be possible to make the implications of this paper relevant to other areas, such as declarative memory research.

      This is an excellent point, and we thank you for highlighting it. Our design, where a temporal gap exists between the CS offset and US onset, is indeed a form of trace conditioning. We also agree that this feature, particularly given the known role of the hippocampus in trace conditioning, strengthens the link between our findings and the broader field of episodic memory.

      Proposed change in manuscript (Methods, Section "General procedure and stimuli"): We inserted the following text (lines 218-220): "It is important to note that the temporal gap between the CS offset and potential US delivery (see Figure 1A) indicates that our paradigm employs a trace conditioning design. This form of learning is known to be hippocampus-dependent and has been distinguished from delay conditioning.

      Proposed change in manuscript (Discussion): We added the following to the discussion (lines 774-779): "Furthermore, our use of a trace conditioning paradigm, which is known to engage the hippocampus more than delay conditioning does, may have facilitated the detection of item-specific, episodiclike memory traces and their interaction with context. This strengthens the relevance of our findings for understanding the interplay between aversive learning and mechanisms of episodic memory."

      The strength of the evidence at this point would be described as 'solid'. In order to increase the strength (to convincing), analyses including FWE correction would be necessary. I think exploratory (and perhaps some FDR-based) analyses have their valued place in papers, but I agree that these should be reported as such. The issue of testing multiple independent hypotheses also needs to be addressed to increase the strength of evidence (to convincing). Evaluating the design with 4 cues could lead to false positives if, for example, current valence, i.e. (CS++ and CS-+) > (CS+- and CS--), and past valence (CS++ > CS+-) > (CS-+ > CS--) are tested as independent tests within the same data set. Authors need to adjust their alpha threshold.

      We fully agree. As summarized in our general response, we have implemented two major changes to our statistical approach to address these concerns comprehensively. These, are stated above, are the following:

      (1) Correction for Multiple Hypotheses: We previously used FWER-corrected p-values that were obtained through permutation testing. We have now applied a Bonferroni adjustment to the FWER-corrected threshold (previously 0.05) used in our searchlight analyses. For instance, in the acquisition phase, since 2 independent tests (contrasts) were conducted, the significance threshold of each of these searchlight maps was set to p <0.025 (after FWE-correction estimated through non-parametric permutation testing); in reversal, 4 tests were conducted, hence the significance threshold was set to p<0.0125. This change is now clearly described in the Methods section (section “Searchlight approach” (lines 477484). This change had no impact on our searchlight results, given that all clusters that were previously as significant with the previous FWER alpha of 0.05 were also significant at the new, Bonferroni-adjusted thresholds; we also now report the cluster-specific corrected p-values in the cluster tables in Supplementary Material.

      (2) ROI Analyses: Our ROI-based analyses used FDR-based correction for within each item reinstatement/generalized reinstatement pair of each ROI. We now explicitly state in the abstract, methods and results sections that these ROI-based analyses are exploratory and secondary to the primary whole-brain results, given that the correction method used is more liberal, in accordance with the exploratory character of these analyses.

      We are confident that these changes ensure both the robustness and transparency of our reported findings.

      Reviewer #1 (Public Review):

      (1) I had a difficult time unpacking lines 419-420: "item stability represents the similarity of the neural representation of an item to other representations of this same item."

      We thank the reviewer for pointing out this lack of clarity. We have revised the definition to be more intuitive and have ensured it is introduced earlier in the manuscript.

      Proposed change in manuscript (Introduction, lines 144-150): We introduced the concept earlier and more clearly: "Furthermore, we can measure the consistency of a neural pattern for a given item across multiple presentations. This metric, which we refer to as “item stability”, quantifies how consistently a specific stimulus (e.g., the image of a kettle) is represented in the brain across multiple repetitions of the same item. Higher item stability has been linked to successful episodic memory encoding (Xue et al., 2010)."

      Proposed change in manuscript (Methods, Section "Item stability and generalization of cues"): Original text: "Thus, item stability represents the similarity of the neural representation of an item to other representations of this same item (Xue, 2018), or the consistency of neural activity across repetitions (Sommer et al., 2022)."

      Revised text (lines 434-436): "Item stability is defined as the average similarity of neural patterns elicited by multiple presentations of the same item (e.g., the kettle). It therefore measures the consistency of an item's neural representation across repeated encounters."

      (2) The authors use the phrase "representational geometry" several times in the paper without clearly defining what they mean by this.

      We apologize for this omission. We have now added a clear and concise definition of "representational geometry" in the Introduction, citing the foundational work by Kriegeskorte et al. (2008).

      Proposed change in manuscript (Introduction): We inserted the following text (lines 117-125): " By contrast, multivariate pattern analyses (MVPA), such as representational similarity analysis (RSA; Kriegeskorte et al., 2008) has emerged as a powerful tool to investigate the content and structure of these representations (e.g., Hennings et al., 2022). This approach allows us to characterize the “representational geometry” of a set of items – that is, the structure of similarities and dissimilarities between their associated neural activity patterns. This geometry reveals how the brain organizes information, for instance, by clustering items that are conceptually similar while separating those that are distinct."

      (3) The abstract is quite dense and will likely be challenging to decipher for those without a specialized knowledge of both the topic (fear conditioning) and the analytical approach. For instance, the goal of the study is clearly articulated in the first few sentences, but then suddenly jumps to a sentence stating "our data show that contingency changes during reversal induce memory traces with distinct representational geometries characterized by stable activity patterns across repetitions..." this would be challenging for a reader to grok without having a clear understanding of the complex analytical approach used in the paper.

      We agree with your assessment. We have rewritten it to be more accessible to a general scientific audience, by focusing on the conceptual findings rather than methodological jargon.

      Proposed change in manuscript (Abstract): We revised the abstract to be clearer. It now reads: " When we learn that something is dangerous, a fear memory is formed. However, this memory is not fixed and can be updated through new experiences, such as learning that the threat is no longer present. This process of updating, known as extinction or reversal learning, is highly dependent on the context in which it occurs. How the brain represents cues, contexts, and their changing threat value remains a major question. Here, we used functional magnetic resonance imaging and a novel fear learning paradigm to track the neural representations of stimuli across fear acquisition, reversal, and test phases. We found that initial fear learning creates generalized neural representations for all threatening cues in the brain’s fear network. During reversal learning, when threat contingencies switched for some of the cues, two distinct representational strategies were observed. On the one hand, we still identified generalized patterns for currently threatening cues, whereas on the other hand, we observed highly stable representations of individual cues (i.e., item-specific) that changed their valence, particularly in the precuneus and prefrontal cortex. Furthermore, we observed that the brain represents contexts more distinctly during reversal learning. Furthermore, additional exploratory analyses showed that the degree of this context specificity in the prefrontal cortex predicted the subsequent return of fear, providing a potential neural mechanism for fear renewal. Our findings reveal that the brain uses a flexible combination of generalized and specific representations to adapt to a changing world, shedding new light on the mechanisms that support cognitive flexibility and the treatment of anxiety disorders via exposure therapy."

      (4) Minor: I believe it is STM200 not the STM2000.

      Thank you for pointing this out. We have corrected it in the Methods section.

      Proposed change in manuscript (Methods, Page 5, Line 211): Original: STM2000 -> Corrected: STM200

      (5) Line 146: "...could be particularly fruitful as a means to study the influence of fear reversal or extinction on context representations, which have never been analyzed in previous fear and extinction learning studies." I direct the authors to Hennings et al., 2020, Contextual reinstatement promotes extinction generalization in healthy adults but not PTSD, as an example of using MVPA to decipher reinstatement of the extinction context during test.

      Thank for pointing us towards this relevant work. We have revised the sentence to reflect the state of the literature more accurately.

      Proposed change in manuscript (Introduction, Page 3): Original text: "...which have never been analyzed in previous fear and extinction learning studies." 

      Revised text (lines 154-157): "...which, despite some notable exceptions (e.g., Hennings et al., 2020), have been less systematically investigated than cue representations across different learning stages."

      (6) This is a methodological/conceptual point, but it appears from Figure 1 that the shock occurs 2.5 seconds after the CS (and context) goes off the screen. This would seem to be more like a trace conditioning procedure than a standard delay fear conditioning procedure. This could be a trivial point, but there have been numerous studies over the last several decades comparing differences between these two forms of fear acquisition, both behaviorally and neurally, including differences in how trace vs delay conditioning is extinguished.

      Thank you for this pertinent observation; this was also pointed out by the editor. As detailed in our response to the editor, we now explicitly acknowledge that our paradigm uses a trace conditioning design, and have added statements to this effect in the Methods and Discussion sections (lines 218-220, and 774-779).

      (7) In Figure 4, it would help to see the individual data points derived from the model used to test significance between the different conditions (reinstatement between Acq, reversal, and test-new).

      We agree that this would improve the transparency of our results. We have revised Figure 4 to include individual data points, which are now plotted over the bar graphs. 

      Reviewer #2 (Public Review & Recommendations)

      Use a more stringent method of multiple comparison correction: voxel-wise FWE instead of FDR; Holm-Bonferroni across multiple hypothesis tests. If FDR is chosen then the exploratory character of the results should be transparently reported in the abstract.

      Thank you for these critical comments regarding our statistical methods. As detailed in the general response and response to the editor (Comment 3), we have thoroughly revised our approach to ensure its rigor. We now clarify that our whole-brain analyses consistently use FWER-corrected pvalues. Additionally, the significance of these FWER-corrected p-values (obtained through permutation testing), which were previously considered significant against a default threshold of 0.05, are now compared with a Bonferroni-adjusted threshold equal to the number of tested contrasts in each experimental phase. We have modified the revised manuscript accordingly, in the methods section (lines 473-484) and in the supplementary material, where we added the p-values (FWER-corrected) of each cluster, evaluated against the new Bonferroni-adjusted thresholds. It is to be of note that this had no impact on our searchlight results, given that all clusters that were previously reported as significant with the alpha threshold of 0.05 were also significant at the new, corrected thresholds.

      Proposed change in manuscript (Methods): We revised the relevant paragraphs (lines 473-484): "Significance corresponding to the contrast between conditions of the maps of interest was FWER-corrected using nonparametric permutation testing at the cluster level (10,000 permutations) to estimate significant cluster size. Additionally, we adjusted the alpha threshold against which we assessed the significance of the cluster-specific FWERcorrected p-values using Bonferroni correction. In this order, we divided the default alpha corrected threshold of 0.05 by the number of statistical comparisons that were conducted in each experimental phase. For example, for fear acquisition, we compared the CS+>CS- contrast for both item stability and cue generalization, resulting in 2 comparisons and hence a corrected alpha threshold of 0.025. Only clusters that had a FWER-corrected p-value below the Bonferroni-adjusted threshold were deemed significant. All searchlight analyses were restricted within a gray matter mask.”

      The authors report fMRI results from line 96 onwards; all of these refer exclusively to mass-univariate fMRI which could be mentioned more transparently... The authors contrast "activation fMRI" with "RSA" (line 112). Again, I would suggest mentioning "mass-univariate fMRI", and contrasting this with "multivariate" fMRI, of which RSA is just one flavour. For example, there is some work that is clear and replicable, demonstrating human amygdala involvement in fear conditioning using SVM-based analysis of highresolution amygdala signals (one paper is currently cited in the discussion).

      Thank you for this important clarification. We have revised the manuscript to incorporate your suggestions. We now introduce our initial analyses as "mass-univariate" and contrast them with the "multivariate pattern analysis" (MVPA) approach of RSA.

      Proposed change in manuscript (Introduction): We revised the relevant paragraphs (lines 113-125): " While mass-univariate functional magnetic resonance imaging (fMRI) activation studies have been instrumental in identifying the brain regions involved in fear learning and extinction, they are insensitive to the patterns of neural activity that underlie the stimulus-specific representations of threat cues and contexts. Contrastingly, multivariate pattern analyses methods, such as representational similarity analysis (RSA; Kriegeskorte et al., 2008), have emerged as a powerful tool to investigate the content and structure of these representations (e.g., Hennings et al., 2022). This approach allows us to characterize the “representational geometry” of a set of items – i.e., the structure of similarities and dissimilarities between their associated neural activity patterns. This geometry reveals how the brain organizes information, for instance, by clustering items that are conceptually similar while separating those that are distinct.”

      Line 177: unclear how incomplete data was dealt with. If there are 30 subjects and 9 incomplete data sets, then how do they end up with 24 in the final sample?

      We apologize for the unclear wording in our original manuscript. We have clarified the participant exclusion pipeline in the Methods section.

      Proposed change in manuscript (Methods, Section "Participants"): Original text: "The number of participants with usable fMRI data for each phase was as follows: N = 30 for the first phase of day one, N = 29 for the second phase of day one, N = 27 for the first phase of day two, and N = 26 for the second phase of day two. Of the 30 participants who completed the first session, four did not return for the second day and thus had incomplete data across the four experimental phases. An additional two participants were excluded from the analysis due to excessive head movement (>2.5 mm in any direction). This resulted in a final sample of 24 participants (8 males) between 18 and 32 years of age (mean: 24.69 years, standard deviation: 3.6) with complete, low-motion fMRI data for all analyses." 

      Revised text: "The number of participants with usable fMRI data for each phase was as follows: N = 30 for the first phase of day one, N = 29 for the second phase of day one, N = 27 for the first phase of day two, and N = 26 for the second phase of day two. An additional two participants were excluded from the analysis due to excessive head movement (>2.5 mm in any direction). This resulted in a final sample of 24 participants (8 males) between 18 and 32 years of age (mean: 24.69 years, standard deviation: 3.6) with complete, low-motion fMRI data for all analyses."

      Typo in line 201.  

      Thank you for your comment. We have re-examined line 201 (“interval (Figure 1A). A total of eight CSs were presented during each phase and”) and the surrounding text but were unable to identify a clear typographical error in the provided quote. However, in the process of revising the manuscript for clarity, we have rephrased this section.

      it would be good to see all details of the US calibration procedure, and the physical details of the electric shock (e.g. duration, ...).

      Thank you for your comment. We have expanded the Methods section to include these important details.

      Proposed change in manuscript (Methods, Section "General procedure and stimuli"): We inserted the following text (lines 225-230): "Electrical stimulation was delivered via two Ag/AgCl electrodes attached to the distal phalanx of the index and middle fingers of the non-dominant hand. he intensity of the electrical stimulation was calibrated individually for each participant prior to the experiment. Using a stepping procedure, the voltage was gradually increased until the participant rated the sensation as 'unpleasant but not painful'.

      "beta series modelling" is a jargon term used in some neuroimaging software but not others. In essence, the authors use trial-by-trial BOLD response amplitude estimates in their model. Also, I don't think this requires justification - using the raw BOLD signal would seem outdated for at least 15 years.

      Thank you for this helpful suggestion. We have simplified the relevant sentences for improved clarity.

      Proposed change in manuscript (Methods, Section "RSA"): Original text: "...an approach known as beta-series modeling (Rissman et al., 2004; Turner et al., 2012)." 

      Revised text (lines 391-393): "...an approach that allows for the estimation of trial-by-trial BOLD response amplitudes, often referred to as beta-series modeling (Rissman et al., 2004). Specifically, we used a Least Square Separate (LSS) approach..."

      I found the use of "Pavlovian trace" a bit confusing. The authors are coming from memory research where "memory trace" is often used; however, in associative learning the term "trace conditioning" means something else. Perhaps this can be explained upon first occurrence, and "memory trace" instead of "Pavlovian trace" might be more common.

      We are grateful for this comment, as it highlights a critical point of potential confusion, especially given that we now acknowledge our paradigm uses a trace conditioning design. To eliminate this ambiguity, we have replaced all instances of "Pavlovian trace" with "lingering fear memory trace" throughout the manuscript (lines 542 and 599).

      I would suggest removing evaluative statements from the results (repeated use of "interesting").

      Thank you for this valuable suggestion. We have reviewed the Results section and removed subjective evaluative words to maintain a more objective tone. 

      Line 882: one of these references refers to a multivariate BOLD analysis using SVM, not explicitly using temporal information in the signal (although they do show session-by-session information).

      Thank you for this correction. We have re-examined the cited paper (Bach et al., 2011) and removed its inclusion in the text accordingly.

    1. eLife Assessment

      This important article reports on the role of specific interneurons in the motion processing circuitry of the fruit fly, and marshals convincing evidence from neural recording, genetic manipulation, and behavioral analysis. A significant result ties the activity of C2/C3 neurons to the temporal resolution of the motion vision system. It remains unclear whether disrupting this pathway affects the dynamics of vision more generally.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

    3. Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

    4. Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

    1. eLife Assessment

      This important and compelling study establishes a robust computational and experimental framework for the large-scale identification of metallophore biosynthetic clusters. The work advances beyond current standards, providing theoretical and practical value across microbiology, bioinformatics, and evolutionary biology.

    2. Reviewer #1 (Public review):

      This work by Reitz, Z. L. et al. developed an automated tool for high-throughput identification of microbial metallophore biosynthetic gene clusters (BGCs) by integrating knowledge of chelating moiety diversity and transporter gene families. The study aimed to create a comprehensive detection system combining chelator-based and transporter-based identification strategies, validate the tool through large-scale genomic mining, and investigate the evolutionary history of metallophore biosynthesis across bacteria.

      Major strengths include providing the first automated, high-throughput tool for metallophore BGC identification, representing a significant advancement over manual curation approaches. The ensemble strategy effectively combines complementary detection methods, and experimental validation using HPLC-HRMS strengthens confidence in computational predictions. The work pioneers a global analysis of metallophore diversity across the bacterial kingdom and provides a valuable dataset for future computational modeling.

      Some limitations merit consideration. First, ground truth datasets derived from manual curation may introduce selection bias toward well-characterized systems, potentially affecting performance assessment accuracy. Second, the model's dependence on known chelating moieties and transporter families constrains its ability to detect novel metallophore architectures, limiting discovery potential in metagenomic datasets. Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.

      The authors successfully achieved their stated objectives. The tool demonstrates robust performance metrics and practical utility through large-scale application to representative genomes. Results strongly support their conclusions through rigorous validation, including experimental confirmation of predicted metallophores via HPLC-HRMS analysis.

      The work provides a significant and immediate impact by enabling the transition from labor-intensive manual approaches to automated screening. The comprehensive phylogenetic framework advances understanding of bacterial metal acquisition evolution, informing future studies on microbial metal homeostasis. Community utility is substantial, since the tool and accompanying dataset create essential resources for comparative genomics, algorithm development, and targeted experimental validation of novel metallophores.

    3. Reviewer #2 (Public review):

      Summary:

      This study presents a systematic and well-executed effort to identify and classify bacterial NRP metallophores. The authors curate key chelator biosynthetic genes from previously characterized NRP-metallophore biosynthetic gene clusters (BGCs) and translate these features into an HMM-based detection module integrated within the antiSMASH platform.

      The new algorithm is compared with a transporter-based siderophore prediction approach, demonstrating improved precision and recall. The authors further apply the algorithm to large-scale bacterial genome mining and, through reconciliation of chelator biosynthetic gene trees with the GTDB species tree using eMPRess, infer that several chelating groups may have originated prior to the Great Oxidation Event.

      Overall, this work provides a valuable computational framework that will greatly assist future in silico screening and preliminary identification of metallophore-related BGCs across bacterial taxa.

      Strengths:

      (1) The study provides a comprehensive curation of chelator biosynthetic genes involved in NRP-metallophore biosynthesis and translates this knowledge into an HMM-based detection algorithm, which will be highly useful for the initial screening and annotation of metallophore-related BGCs within antiSMASH.

      (2) The genome-wide survey across a large bacterial dataset offers an informative and quantitative overview of the taxonomic distribution of NRP-metallophore biosynthetic chelator groups, thereby expanding our understanding of their phylogenetic prevalence.

      (3) The comparative evolutionary analysis, linking chelator biosynthetic genes to bacterial phylogeny, provides an interesting and valuable perspective on the potential origin and diversification of NRP-metallophore chelating groups.

      Weaknesses:

      (1) Although the rule-based HMM detection performs well in identifying major categories of NRP-metallophore biosynthetic modules, it currently lacks the resolution to discriminate between fine-scale structural or biochemical variations among different metallophore types.

      (2) While the comparison with the transporter-based siderophore prediction approach is convincing overall, more information about the dataset balance and composition would be appreciated. In particular, specifying the BGC identities, source organisms, and Gram-positive versus Gram-negative classification would improve transparency. In the supplementary tables, the "Just TonB" section seems to include only BGCs from Gram-negative bacteria - if so, this should be clearly stated, as Gram type strongly influences siderophore transport systems.

    1. eLife Assessment

      This study proposes a valuable and interpretable approach for predicting hematoma expansion in patients with spontaneous intracerebral hemorrhage from non-contrast computed tomography. The evidence supporting the proposed method is solid, including predictive performance evaluated through external validation. This quantitative approach has the potential to improve hematoma expansion prediction with better interpretability. The work will be of interest to medical biologists working on stroke and neuroimaging.

    2. Reviewer #1 (Public review):

      Summary:

      The study explores the use of Transport-based morphometry (TBM) to predict hematoma expansion and growth 24 hours post-event, leveraging Non-Contrast Computed Tomography (NCCT) scans combined with clinical and location-based information. The research holds significant clinical potential, as it could enable early intervention for patients at high risk of hematoma expansion, thereby improving outcomes. The study is well-structured, with detailed methodological descriptions and a clear presentation of results. However, the practical utility of the predictive tool requires further validation, as the current findings are based on retrospective data. Additionally, the impact of this tool on clinical decision-making and patient outcomes needs to be further investigated.

      Strengths

      (1) Clinical Relevance: The study addresses a critical need in clinical practice by providing a tool that could enhance diagnostic accuracy and guide early interventions, potentially improving patient outcomes.

      (2) Feature Visualization: The visualization and interpretation of features associated with hematoma expansion risk are highly valuable for clinicians, aiding in the understanding of model-derived insights and facilitating clinical application.

      (3) Methodological Rigor: The study provides a thorough description of methods, results, and discussions, ensuring transparency and reproducibility.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The study explores the use of Transport-based morphometry (TBM) to predict hematoma expansion and growth 24 hours post-event, leveraging Non-Contrast Computed Tomography (NCCT) scans combined with clinical and location-based information. The research holds significant clinical potential, as it could enable early intervention for patients at high risk of hematoma expansion, thereby improving outcomes. The study is well-structured, with detailed methodological descriptions and a clear presentation of results. However, the practical utility of the predictive tool requires further validation, as the current findings are based on retrospective data. Additionally, the impact of this tool on clinical decision-making and patient outcomes needs to be further investigated.

      Strengths:

      (1) Clinical Relevance: The study addresses a critical need in clinical practice by providing a tool that could enhance diagnostic accuracy and guide early interventions, potentially improving patient outcomes.

      (2) Feature Visualization: The visualization and interpretation of features associated with hematoma expansion risk are highly valuable for clinicians, aiding in the understanding of model-derived insights and facilitating clinical application.

      (3) Methodological Rigor: The study provides a thorough description of methods, results, and discussions, ensuring transparency and reproducibility.

      Weaknesses:

      (1) The limited sample size in this study raises concerns about potential model overfitting. While the reported AUCROC of 0.71 may be acceptable for clinical use, the robustness of the model could be further enhanced by employing techniques such as k-fold crossvalidation. This approach, which aggregates predictive results across multiple folds, mimics the consensus of diagnoses from multiple clinicians and could improve the model's reliability for clinical application. Additionally, in clinical practice, the utility of the model may depend on specific conditions, such as achieving high specificity to identify patients at risk of hematoma expansion, thereby enabling timely interventions. Consequently, while AUC is a commonly used metric, it may not fully capture the model's clinical applicability. The authors should consider discussing alternative performance metrics, such as specificity and sensitivity, which are more aligned with clinical needs. Furthermore, evaluating the model's performance in real-world clinical scenarios would provide valuable insights into its practical utility and potential impact on patient outcomes.

      We thank the reviewer for these thoughtful comments. We agree that k-fold cross validation is a valid approach to reduce bias associated with overfitting and account for variability in the dataset composition. During the training and optimization process, this was employed within the VISTA dataset where data were shuffled at random and separated into independent training (60%) and internal validation (40%) datasets. This process was repeated 1000 times, to generate 1000 different training and internal validation splits. Statistical analyses and data visualization were performed independently on each of the 1000 cross-validation samples, and the mean results with corresponding 95% confidence intervals are presented. The p-values were averaged using the Fisher’s method. We have included this information in the methods section. [Page 22; Paragraph 1, Lines 8-10]. External validation was performed on the ERICH dataset and analyzed only once. We chose not to perform k-fold cross validation with the test dataset in attempt to assess the model’s generalizability to unseen data from a different patient cohort. However, we agree that taking advantage of the full 1,066 ERICH cases for model validation would improve the strength of our conclusions regarding the model’s robustness. This has been included in the discussion. [Page 15; Paragraph 1; Lines 11-14].

      We agree that the AUC alone will not effectively describe the clinical applicability of the intended model. We have added the sensitivity and specificity metrics for the TBM’s performance in the external dataset to the discussion. The design of the present study was primarily a pre-clinical methodological study. However, we have suggested that future external validation studies should seek to identify ideal sensitivity and specificity thresholds when evaluating the model’s translatability to a clinical setting. [Page 11; Paragraph 2; Line 22 and Page 12; Paragraph 1; Lines 2-4]. We agree that future validation studies should also assess the model’s performance in a real-world clinical setting and have emphasized this point in the discussion. [Page 13; Paragraph 2; Lines 22-23 and Page 14; Paragraph 1; Lines 1-4].

      (2) The authors compared the performance of TBM with clinical and location-based information, as well as other machine learning methods. While this comparison highlights the relative strengths of TBM, the study would benefit from providing concrete evidence on how this tool could enhance clinicians' ability to assess hematoma expansion in practice. For instance, it remains unclear whether integrating the model's output with a clinician's own assessment would lead to improved diagnostic accuracy or decisionmaking. Investigating this aspect-such as through studies evaluating the combined performance of clinician judgment and model predictions-could significantly enhance the tool's practical value.

      We thank the reviewer for this suggestion. The present study intended to suggest potential advantages of the TBM method with comparison to alternate clinician-based and machine learning methods. While we agree that the TBM method warrants further evaluation in a realworld clinical setting to determine its practical utility, we propose that further optimization of TBM is first needed to improve its predictive accuracy. 

      In developing TBM, our eventual goal is to produce a prediction tool, which can identify patients at risk for hematoma expansion early in the disease course, who may benefit from intervention with surgical and/or medical therapies. Current clinician-based risk stratification methods are highly variable in accuracy, inefficient, and require subjective interpretation of the NCCT scan. Our eventual goal is to aid clinical decision making with an automated, accurate and efficient model. In follow up work, we will study how to combine information derived from imaging and TBM with other assessment tools and clinical data in order to best inform clinicians. This has been incorporated into the discussion. [Page 14; Paragraph 1; Lines 1-4].

      Reviewer #2 (Public review):

      Summary:

      The author presents a transport-based morphometry (TBM) approach for the discovery of noncontrast computed tomography (NCCT) markers of hematoma expansion risk in spontaneous intracerebral hemorrhage (ICH) patients. The findings demonstrate that TBM can quantify hematoma morphological features and outperforms existing clinical scoring systems in predicting 24-hour hematoma expansion. In addition, the inversion model can visualize features, which makes it interpretable. In conclusion, this research has clinical potential for ICH risk stratification, improving the precision of early interventions.

      Strengths:

      TBM quantifies hematoma morphological changes using the Wasserstein distance, which has a well-defined physical meaning. It identifies features that are difficult to detect through conventional visual inspection (such as peripheral density distribution and density heterogeneity), which provides evidence supporting the "avalanche effect" hypothesis in hematoma expansion pathophysiology.

      Weaknesses:

      (1) As a methodology-focused study, the description of the methods section somewhat lacks depth and focus, which may make it challenging for readers to fully grasp the overall structure and workflow of the approach. For instance, the manuscript lacks a systematic overview of the entire process, from NCCT image input to the final prediction output. A potential improvement would be to include a workflow figure at the beginning of the manuscript, summarizing the proposed method and subsequent analytical procedures. This would help readers better understand the mechanism of the model.

      We thank the reviewer for this suggestion. We have included a figure detailing the TBM workflow to improve reader understanding. [Figure 1, Page 5; Paragraph 2; Lines 19-20 and Page 30; Paragraph 1].

      (2) The description of the comparison algorithms could be more detailed. Since TBM directly utilizes NCCT images as input for prediction, while SVM and K-means are not inherently designed to process raw imaging data, it would be beneficial to clarify which specific features or input data were used for these comparison models. This would better highlight the effectiveness and advantages of the TBM method.

      We thank the reviewer for this suggestion. We have included additional details of the comparison with machine learning models in the methods section. While we used PCA on the extracted transport maps and raw image data for dimensionality reduction prior to classification, we agree that the machine learning methods described may not have been optimally tuned to examine the data in the format in which it was presented. Future studies should aim to compare TBM with optimized machine and deep learning methods to determine TBM’s potential as an automated clinical risk stratification tool. We have added this to the limitations section of the discussion. [Page 14; Paragraph 2; Lines 22-23 and Page 15; Paragraph 1; Lines 1-2].

      (3) The relatively small training and testing dataset may limit the model's performance and generalizability. Notably, while the study mentions that 1,066 patients from the ERICH dataset met the inclusion criteria, only 170 were randomly selected for the test set. Leveraging the full 1,066 ERICH cases for model training and internal validation might potentially enhance the model's robustness and performance.

      We thank the reviewer for this suggestion. As the reviewer highlights, the intention of the manuscript was to present a methodologically focused study which led to our small validation cohort of 170 patients from the ERICH dataset. It is our intention to further optimize and validate the TBM method in a future larger study which is underway, taking full advantage of the ERICH dataset. This has been incorporated into the discussion section. [Page 15; Paragraph 1; Lines 1114].

      (4) Some minor textual issues need to be checked and corrected, such as line 16 in the abstract "Incorporating these traits into a v achieved an AUROC of 0.71 ...".

      We thank the reviewer for this comment. The typographical error has been corrected. 

      (5) Some figures need to be reformatted (e.g., the x-axis in Figure 2 a is blocked).

      We thank the reviewer for this comment. This was intentional to demonstrate that the X-axis in Figure 2a and 2b are identical and thereby highlight image features corresponding to the regression line on the graph.

    1. eLife Assessment

      This important study presents findings on the patterned loss of Purkinje cells in the cerebellum during aging. The compelling data nicely support the conclusions of this study. This work advances understanding of mechanisms underlying neurodegeneration with aging and provides the basis for development of treatments for age-related neurological disorders.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Donofrio et al. investigated cerebellar Purkinje cell (PC) degeneration during normal aging using both mouse and human samples. They found that PC loss followed a stripe pattern rather than occurring randomly. Although this pattern resembled the pattern of zebrin II expression in the anterior cerebellum, the overall pattern was different from zebrin II expression. Surviving PCs exhibited severe degeneration, including thickened axons, axonal torpedoes and shrunken dendrites. These structural changes were accompanied by functional deficits in motor coordination and tremor. Understanding why certain PC subpopulations are more vulnerable than others may provide insight into regional susceptibility (or resilience) to aging and inform potential therapeutic strategies for age-related neurological disorders. Overall, the findings are novel and significant, supported by compelling evidence from structural and functional analyses. The authors have fully addressed my previous concerns and improved the clarity of their presentation. I believe this work will have a significant impact in the field.

    3. Reviewer #2 (Public review):

      Summary:

      The cerebellum is known to be vulnerable to aging, yet specific cell type vulnerability remains understudied. This important study convincingly demonstrate that the normal aged mouse cerebellum exhibits Purkinje cell loss, and that the vulnerable PCs to age are arranged on the basis of known Zebrin stripe pattern that represents a particular subtype of the PCs. As the authors wrote, future studies should investigate why this PC loss phenotype occurs stochastically across the population, and whether these findings parallel human cerebellar aging.

      Strength:

      • Banding pattern of PC loss is very clearly demonstrated by combining immunostaining for Zebrin.

      • A critical methodological concern that a standard PC marker, Calbindin, could be compromised in aging has been addressed by performing control experiments with appropriate counterstaining and a transgenic mouse.

      • Parallels with neurodegenerative phenotype would be helpful to understand the mechanisms of age-related PC loss in future.

      Weakness:

      • Limited strain diversity: The study exclusively uses C57BL/6J mice despite known genetic and motor differences among even closely related strains like C57BL/6N, weakening the generalizability of the findings. However, on the other hand, the presence of age-related PC loss makes C57BL/6J an interesting mouse model for studying aging of the cerebellum.

      • Linkages with normal human aging and cerebellar function is not supported well. It remains unclear whether this PC loss phenomenon is universal or specific to a particular individual, and whether specific to human PC subtype.

    4. Reviewer #3 (Public review):

      Donofrio et al. report a new observation that in normal aging mice, anti-calbindin whole-mount staining and coronal immunohistochemistry in the cerebellum often show a sagittally patterned loss of Purkinje cells with age. The authors address a central concern that calbindin antibody staining alone is not sufficient to definitively assess Purkinje cell loss, and corroborate their antibody staining data with transgenic Pcp2-CRE x flox-GFP reporter mice and Neutral Red staining. The authors then investigate whether this patterned Purkinje loss correlates with the known parasagittal expression of zebrin-II, finding a strong but imperfect correlation with zebrin-II antibody staining. They next draw a connection between this age-related Purkinje loss to the age-related decline in motor function in mice, with trending but non-significant statistical association between the severity/patterning of Purkinje loss and motor phenotypes within cohorts of aged mice. Finally, the authors look at post-mortem human cerebellar tissues from deceased healthy donors between 21 and 74 years of age, finding a positive correlation between Purkinje degeneration and age, but with unknown spatial patterning.

      The conclusions drawn from this study are well supported by the data provided, with image quantification corroborating visual observations. The authors highlight several examples of parasagittal patterning of Purkinje cell degeneration in disease, and they show that proper methodologies must be used to account for these patterns to avoid highly variable data in the sagittal plane. The authors aptly point out that additional work is needed to investigate the spatial patterns of Purkinje cell loss in the human cerebellum.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      While the authors have largely ruled out zebrin II as the key protein underlying PC vulnerability or resistance to age-related loss, the molecular basis of this phenomenon remains unidentified. This reviewer acknowledges the complexity of this investigation and considers it a minor issue, as the manuscript thoughtfully discusses the gap and highlights it as a future direction.

      We appreciate the reviewer’s acknowledgement of the complexity of determining the molecular basis of differential Purkinje cell vulnerability. Moreover, we acknowledge that zebrin II expression/identity is not the only factor in determining vulnerability; rather, the compartmentalized map as a whole may dictate these differences. We are eager to shed light on this issue through future study.

      In cases where no PC loss is observed in aged mice (Figure 1F), it is unclear whether these PCs undergo morphological degeneration, such as thickened axons and shrunken dendrites. Further characterization of these resilient PCs would help understand why the aged mice without PC loss still exhibit motor deficits (Figure 7).

      Thank you for the excellent idea of examining Purkinje cell morphology in aged mice without Purkinje cell loss. Upon looking for hallmarks of neurodegeneration, such as shrunken dendrites and axonal swellings, in aged mice without Purkinje cell loss, we observed minimal axonal pathology and no shrinkage of the molecular layer.  However, we note that while the features we examined are wellstudied hallmarks of degeneration, they are specific rather than exhaustive, and subtle morphological characteristics that are beyond our methods’ detection may change. We have added these new results to Figure 2C and added these notes to the manuscript.

      The histologic analysis is based on mice with different genetic backgrounds. For example, the PC-specific reporter mice include two strains: Pcp2-Cre; Ai32 and Pcp2-Cre; Ai40D. These genetic variations may contribute to the heterogeneity of PC loss (Figure 1). To improve clarity, please add the genetic background details to Table 1.

      We have added the genetic backgrounds of all mice used in the study to Table 1.

      Please indicate from which lobule in the anterior or posterior human cerebellum the images in Figure 8 were taken.

      Unfortunately, because of the limitations of human postmortem tissue collection (in some cases, we are provided with a very small block that was collected after the pathologist completed their primary duty for that individual), we cannot with full certainty distinguish the lobules from which the images were taken. However, we are grateful that, upon our request, the pathologists were able to collect tissue mainly from the vermis, which is where we wished to begin, knowing that the vermis in rodents and non-human primates typically has the clearest and most well-studied pattern. That said, this is an important issue that we are addressing for future studies.

      Reviewer #2 (Public review):

      (1) Limited strain diversity: The study exclusively uses C57BL/6J mice despite known genetic and motor differences even the closely related strains like C57BL/6N.

      Thank you for pointing out this limitation of our study. We chose to limit this initial study to C57BL/6J mice based on their widespread use as a background strain on many currently maintained lines. That said, our study intentionally included several different crosses to provide genetic variability, even though C57BL/6J is still the predominant genetic background. In addition to the motor differences in genetic strains, we are also particularly interested in the differences in cerebellar morphology across strains (Inouye and Oda, 1980; Sillitoe and Joyner, 2007). Our use of mice maintained on the C57BL/6J background leaves open an exciting future direction: investigating age-related Purkinje cell loss in mice of different inbred and outbred strains. Given the importance of the topic, we have included new text in the discussion to alert the reader to this limitation of our study and to highlight interesting differences across strains that will be important to disentangle in our future work.

      (2) No correlation quantified between the degree of PC loss, aging, and motor performance.

      We sought to conduct a broad overview of motor problems that might be caused by age-related Purkinje cell loss, rather than a comprehensive investigation of how motor behavior changes with advancing Purkinje cell loss. Therefore, we agree with the reviewer’s comment, and we have added text to indicate that stronger correlations between these domains would be best tackled with deeper behavioral phenotyping conducted over time to match the potentially cooccurring progressive changes in cerebellar morphology, with a focus on Purkinje cell degeneration and eventual loss.

      (3) It has not been demonstrated whether the neurodegenerative changes are indeed observed in zebrin-negative PCs.

      We have added Supplementary Figure 4, which includes an example of reduced dendritic density and loss of Purkinje cell somata in zebrin II-negative stripes in lobules II and III. Please also see Figure 4B for an example of reduced dendritic density in zebrin II-negative Purkinje cells in lobules III and IV.

      (4) The mechanisms of why only a subset of mice show PC loss remain unexplored and not discussed.

      We agree that our manuscript would benefit from discussion of why some aged mice are resistant to age-related Purkinje cell loss. We have elaborated upon possible reasons for this differential vulnerability in the discussion.

      (5) Linkages with normal human aging and cerebellar function are not well supported. While motor behavioral assays captured phenotypes that mimic aged people, correlation with PC loss is demonstrated to be absent in mice. It remains unclear whether this PC loss phenomenon is universal or specific to a particular individual; and whether specific to a human PC subtype.

      In our study, we sought to show that patterned age-related Purkinje cell loss presents a promising area for future research in humans. We agree that further study is needed to solidify a link between age-related Purkinje cell loss in mice and humans and the implications for motor function. The reviewer raises a fair criticism that reflects the current state of knowledge: studies that link cerebellar aging to  motor function and cognitive decline in humans are few, as are studies of the cellular-level morphological changes of cerebellar aging –there is a pressing need for deeper study of human tissue. To address the issue raised by the reviewer, we have included new text to the discussion of our manuscript indicating these gaps in knowledge. 

      (6) Analyses in the paraflocculus are currently not easy to understand. This lobule has heterogeneous PC subtypes, developmentally or molecularly. Zebrin-weak and Zebrinintense PCs are known to be arranged in stripes, which resembles the pattern of developmentally defined PC subsets (Fujita et al., 2014, Plos one; Fujita et al., 2012, J Neurosci). In the data presented, it is hard to appreciate whether the viewing angle is consistent relative to the angle of the paraflocculus. This may be a limitation of the analysis of the paraflocculus in general, that the orientation of this lobule is so susceptible to fixation and dissection. Discrepancy between PC loss stripe and zebrin pattern may be an overstatement, because appropriate analyses on the paraflocculus would require a rigorously standardized analytic method.

      Thank you for your valuable insights on the complexity of analyzing the paraflocculus. We have altered our language to more accurately reflect the nuanced zebrin II expression pattern of this region. We also agree with and very much appreciate your advice that “analyses on the paraflocculus would require a rigorously standardized analytic method.” We have added these arguments to the revised manuscript text.

      Reviewer #3 (Public review):

      (1) In Figure 3, the authors use Pcp2-CRE mice to drive GFP expression in Purkinje cells in order to avoid the confounding variable of loss of calbindin expression in aging Purkinje cells. The authors go on to say, "we argue that calbindin expression alone is not a reliable, sufficient indicator of Purkinje cell loss". However, in Figure 4, the authors return to calbindin staining alone to assess the correlation of Purkinje cell loss with zebrin-II expression. Could the authors comment on why zebrin-II co-staining experiments were not performed in GFP reporter mice to avoid potential confounds of calbindin expression? Without this experiment, should readers accept the data presented in Figure 4 as a "reliable, sufficient indicator of Purkinje cell loss", given the author's prior claim?

      This is a very good point, thank you. We agree that the data presented in Figure 4 alone would not be a sufficient indicator of Purkinje cell loss. However, we prefaced our calbindin and zebrin II co-staining with calbindin and GFP costaining (Figure 3), which showed that Purkinje cell-specific reporter expression revealed the same pattern of Purkinje cell loss as calbindin expression, and Neutral Red staining (Figure 2 and Supplementary Figure 3B), which confirmed the loss of Purkinje cells independent of immunofluorescence. For this reason, we feel confident that the data in Figure 4 is representative of the striped pattern of age-related Purkinje cell loss. Still, we see and agree with the reviewer’s comment, and therefore, to further show the correlation of Purkinje cell loss with zebrin II expression, we have added a new Supplementary Fig. 4, which shows co-staining of calbindin, GFP, and zebrin II.

      (2) Throughout the manuscript, there is a considerable reliance on the authors' interpretation of imaging data with no accompanying quantification (categorization of "striped" or "non-striped" PC loss, correlation of GFP/calbindin/zebrin-II staining, etc.). While this may be difficult to obtain, the results would be much stronger with a quantitative approach to support the stated categorizations/observations.

      Thank you for your suggestion. Quantifying stripe properties has been a challenging task for the field, given the regionalized features of stripe compartmentalization that make its complex architecture tricky to measure in its typical organization within the 3D anatomy of lobules and fissures and even harder to interpret when there are abnormalities. However, to quantitatively support our categorization of “striped” and “non-striped” Purkinje loss and the observed correlation between calbindin and GFP expression in aged mice, we have quantified the mediolateral pixel intensity across lobules II-IV, in which Purkinje cell loss reliably occurs in zebrin II-negative stripes. The results can be found in Supplementary Figure 1B and Supplementary Figure 3.

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1, both staining artifacts and PC degeneration appear in light color. Please clarify how these two were differentiated.

      Thank you for your comment, which raises an important point about distinguishing staining artifacts from Purkinje cell degeneration. Cerebellar patterning is symmetrical across the midline, so asymmetrical abnormalities are one clue that differentiates staining artifacts from the degenerative pattern. Another indicator of a staining artifact seen in wholemount preparations is the gradual fading of the stain (seen in some hemispheres in Figure 1), which is caused by continuous rubbing of the cerebellum against the tube during the staining process. In some cases, such as in Figure 1F, the cerebellum was damaged during the dissection of the meninges after staining, and in such cases the accidental removal of cerebellar tissue (molecular layer) reveals unstained tissue beneath the surface of the cerebellum. This type of staining artifact can be identified by a missing chunk of tissue surrounded by stained Purkinje cells, compared to the smooth, unmarred tissue where PCs have degenerated. We have added new text to the results (the legends) to clarify these critical points for the reader.

      (2) In Figure 7C, please consider changing "Aged without stripes" to "Aged without PC loss" to be consistent with the labeling used in other panels.

      Thank you for pointing out this discrepancy. We have made the suggested changes.

      Reviewer #3 (Recommendations for the authors):

      Could the authors comment on why zebrin-II co-staining experiments were not performed in GFP reporter mice to avoid potential confounds of calbindin expression? Without this experiment, should readers accept the data presented in Figure 4 as a "reliable, sufficient indicator of Purkinje cell loss", given the author's prior claim?

      Thank you for this recommendation; we appreciate this advice. As we described above, our response to this suggestion reads:

      This is a very good point, thank you. We agree that the data presented in Figure 4 alone would not be a sufficient indicator of Purkinje cell loss. However, we prefaced our calbindin and zebrin II co-staining with calbindin and GFP costaining (Figure 3), which showed that Purkinje cell-specific reporter expression revealed the same pattern of Purkinje cell loss as calbindin expression, and Neutral Red staining (Figure 2 and Supplementary Figure 3B), which confirmed the loss of Purkinje cells independent of immunofluorescence. For this reason, we feel confident that the data in Figure 4 is representative of the striped pattern of age-related Purkinje cell loss. Still, we see and agree with the reviewer’s comment, and therefore to further show the correlation of Purkinje cell loss with zebrin II expression, we have added a new Supplementary Fig. 4, which shows co-staining of calbindin, GFP, and zebrin II.

    1. eLife Assessment

      The study presents important findings that are highly relevant for research aiming to combine transcriptomics, connectivity studies, and activity profiling in the rodent brain and the revisions improve the study. The evidence overall remains convincing as the authors use appropriate and validated methodology in line with current state-of-the-art.

    2. Reviewer #1 (Public review):

      In their paper entitled "Combined transcriptomic, connectivity, and activity profiling of the medial amygdala using highly amplified multiplexed in situ hybridization (hamFISH)" Edwards et al. present a new method designated as hamFISH (highly amplified multiplexed in situ hybridization) that enables sequential detection of {less than or equal to}32 genes using multiplexed branched DNA amplification. As proof-of-principle, the authors apply the new technique - in conjunction with connectivity, and activity profiling - to the medial amygdala (MeA) of the mouse, which is a critical nucleus for innate social and defensive behaviors.

      As mentioned by Edwards et al., hamFISH could prove beneficial as an affordable alternative to other in situ transcriptomic methods, including commercial platforms, that are resource-intensive and require complex analysis pipelines. Thus, the authors envision that the method they present could democratize in situ cell-type identification in individual laboratories.

      The data presented by Edwards et al. is convincing. The authors use the appropriate and validated methodology in line with the current state-of-the-art. The paper makes a strong case for the benefits of hamFISH when combining transcriptomics studies with connectivity tracing and immediate early gene-based activity profiling. Notably, the authors also discuss the caveats and limitations of their study/approach in an open and transparent manner.

      Comments on revisions:

      In their revised paper, Edwards et al. have made an effort to improve manuscript clarity. Revisions made address the non-public "recommendations for the authors." The main criticism that prevents a more enthusiastic overall assessment, i.e., absence of some more in-depth hypothesis-based analysis (though, as originally mentioned, maybe beyond the study's scope), is still valid.

    3. Reviewer #2 (Public review):

      The authors describe the development and implementation of hamFISH, a sensitive multiplexed ISH method. They leverage a pre-existing scRNA-seq dataset for the MeA to design 32 probes that combinatorically represent MeA neuronal populations - ~80% of MeA neurons express at least three of these 32 markers. Using these markers to assess the spatial organization of the MeA, the authors identify a novel population of Ndnf+ projection neurons and characterize their connectivity with anterograde and retrograde labeling. They additionally combine hamFISH with CTB labeling of three principal MeA projections sites to show that 75% of MeA neurons have only a single projection target. Finally, they engage adult male mice in encounters with other adult males (aggression), females (mating), and pups (infanticide), followed with hamFISH and c-fos labeling to relate cell identity to behavior. Their overall conclusion is that hamFISH-defined cell types are broadly active to multiple sensory stimuli. However, the data presented are not sufficient to conclude that no selectivity exists.

      A strength of the manuscript is the novel hamFISH approach, which is technically innovative and could potentially be adopted by many labs. However, a weakness is that the 32 selected hamFISH marker genes employed here are predominantly neuropeptides. These genes, such as Tac1, Cartpt, Adcyap1, Calb1, and Gal, are expressed throughout the MeA, and many other brain regions and are not selective for transcriptomic cell types or developmental lineages. The use of hamFISH probes that provide a more stringent classification of cell type or cell identity could potentially provide a different picture of sensory response selectivity within the MeA. Thus, although the data in the manuscript are exemplary, the biological insight into MeA function is more limited.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Edwards et al. describe hamFISH, a customizable and cost-efficient method for performing targeted spatial transcriptomics. hamFISH utilizes highly amplified multiplexed branched DNA amplification, and the authors extensively describe hamFISH development and its advantages over prior variants of this approach.

      The authors then used hamFISH to investigate an important circuit in the mouse brain for social behavior, the medial amygdala (MeA). To develop a hamFISH probe set capable of distinguishing MeA neurons, the authors mined published single cell RNA-sequencing datasets of the MeA, ultimately creating a panel of 32 hamFISH probes that mostly cover the identified MeA cell types. They evaluated over 600,000 MeA cells and classified neurons into 16 inhibitory and 10 excitatory types, many of which are spatially clustered.

      The authors combined hamFISH with viral and other circuit tracer injections to determine whether the identified MeA cell populations sent and/or received unique inputs from connected brain regions, finding evidence that several cell types had unique patterns of input and output. Finally, the authors performed hamFISH on the brains of male mice that were placed in behavioral conditions that elicit aggressive, infanticidal, or mating behaviors, finding that some cell populations are selectively activated (as assessed by c-fos mRNA expression) in specific social contexts.

      Strengths:

      (1) The authors developed an optimized tissue preparation protocol for hamFISH and implemented oligopools instead of individually synthesized oligonucleotides to reduce costs. The branched DNA amplification scheme improved smFISH signal compared to previous methods, and multiple variants provide additional improvements in signal intensity and specificity. Compared to other spatial transcriptomics methods, the pipeline for imaging and analysis is streamlined, and is compatible with other techniques like fluorescence-based circuit tracing. This approach is cost-effective and has several advantages that make it a valuable addition to the list of spatial transcriptomics toolkits.

      (2) Using 31 probes, hamFISH was able to detect 16 inhibitory and 10 excitatory neuron types in the MeA subregions, including the vast majority of cell types identified by other transcriptomics approaches. The authors quantified the distributions of these cell types along the anterior-posterior, dorsal-ventral, and medial-lateral axes, finding spatial segregation among some, but not all, MeA excitatory and inhibitory cell types. The authors additionally identified a class of inhibitory neurons expressing Ndnf (and a subset of these that express Chrna7) that project to multiple social chemosensory circuits.

      (3) The authors combined hamFISH with MeA input and output mapping, finding cell-type biases in the projections to the MPOA, BNST, and VMHvl, and inputs from multiple regions.

      (4) The authors identified excitatory and inhibitory cell types, and patterns of activity across cell types, that were selectively activated during various social behaviors, including aggression, mating, and infanticide, providing new insights and avenues for future research into MeA circuit function.

      Weaknesses:

      (1) Gene selection for hamFISH is likely to still be a limiting factor, even with the expanded (32-probe) capacity. This may have contributed to the lack of ability to identify sexually dimorphic cell types (Fig. S2B). This is an expected tradeoff for a method that has major advantages in terms of cost and adaptability.

      (2) Adaptation of hamFISH, for example, to adapt it to other brain regions or tissues, may require extensive optimization. This does not preclude it from being highly useful for other brain regions with extra effort.

      (3) Pairing this method with behavioral experiments is likely to require further optimization, as c-fos mRNA expression is an indirect and incomplete survey of neuronal activity (e.g. not all cell types upregulate c-fos when electrically active). As such, there is a risk of false negative results that limit its utility for understanding circuit function.

      (4) The incompatibility of hamFISH with thicker tissue samples and minimal optical sectioning introduce additional technical limitations. For example, it would be difficult to densely sample larger neural circuits using serial 20 micron sections.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comments:

      Recommendations for improvement:

      (1) Address data presentation, editing, and other issues of lack of clarity as pointed out by the reviewers.

      We have now addressed all comments from reviewers that identify editing errors and lack of clarity issues. Regarding data presentation we have made some changes, for example including a combined heatmap to show consistency between row names (Figure 2 - figure supplement 2), but also kept some stylistic features such as the balance between main and supplemental figures that we think fits more naturally with the story of the paper.

      (2) Inclusion of requested and critical details in the methodology section, an important component for broad applicability of a new methodology by other investigators.

      We have added the requested details to the methods section, specifically the RCA protocol.

      (3) More in-depth discussion of the limitations of the methodology and approach to capture important but more complex components of tissues of interest, for example, sexual dimorphism.

      We have now edited the ‘pitfalls of study’ section in the discussion to include further detail of the limitations of the number of genes that can be used to deeply profile transcriptomic types, including sexual dimorphism. Regarding its use in other tissues of interest, we have now included a reference in the discussion (Bintu et al., 2025) where a similar strategy has been used to profile cells in the olfactory epithelium and olfactory bulb. We have also used hamFISH in other brain areas (as commented in our public reviews responses) but as this is unpublished work we will refrain from mentioning it in the main text.

      Reviewer #1 (Recommendations for the authors):

      The manuscript by Edwards et al. would benefit from minor revisions. Here, we outline several points that could / should be addressed:

      (1) General balance of data presentation between main and supplementary figures

      (a) quantifications were often missing from main figures and only presented in the supplements

      Thank you for raising this point. We believe that the balance of panels between the main and supplemental figures matches our story and results section well with quantifications included in the main figures where appropriate.

      (b) more informative figure legends in supplements (e.g.: Supplementary Figure I - Figure 3)

      We have now revised the figure legends and added more description where appropriate.

      (c) missing subpanel in Figure 3; figure legend describes 3H, which is missing in the figure

      We thank the reviewer for pointing this out and have now amended the subpanel.

      stand-alone figure on inhibitory neuron cluster i3 cells

      We agree that this is an important characterisation of i3 cells but decided to place this figure in the supplement as it does not fall within the main storyline (defining transcriptomic characterisation of cell types in a multimodal fashion), but rather acts as accessory information for those specifically interested in these inhibitory cell types.

      statistical tests used (e.g.: Figure 1 C -, Supplementary Figure 3 - Figure 2)/ graphs shown (Supplementary Figure 1 - 1 D)

      The statistical tests used are described in the figure legends.

      t-SNE dimensionality reduction of positional parameters

      Explanations of the t-SNE dimensionality reduction of positional parameters can be found in the materials and methods.

      (d) heatmaps similarly informative and more convincing

      We have included an extra heatmap (Figure 2 - figure supplement 2) in response to Reviewer 3’s comment (see below) in order to more easily follow genes across all the different clusters. We hope this helps to make the heatmaps more convincing and informative.

      code availability

      Code availability is described in the methods section of the manuscript.

      page 6, 3rd paragraph wrong description of PMCo abbreviation

      We thank the reviewer for identifying the mistake and we have now amended it.

      Reviewer #2 (Recommendations for the authors):

      The pre-existing scRNA-seq dataset on which the manuscript is based is an older Drop-seq dataset for which minimal QC information is provided. The authors should include QC information (genes/cells and UMIs/cells) in the Methods. Moreover, the Seurat clustering of these cells and depiction of marker genes in feature plots are not shown.

      It is therefore difficult to determine how the authors selected their 31 genes for their hamFISH panel, or how selective they are to the original Drop-seq clusters.

      The QC information of this dataset can be found in the original publication (Chen et al., 2019) with our clustering methods described in the materials and methods section. We have not included individual gene names in our heatmap plots for presentation purposes (there are over 200 rows), but the data and cluster descriptions can be found in supplemental tables.

      Reviewer #3 (Recommendations for the authors):

      (1) The imaging modality is not entirely clear in the methods. The microscopy technique is referenced to prior work and involves taking z-stacks, but analysis appears to be done on maximum z-projections, which seems like it would introduce the risk of false attribution of gene expression to cells that are overlapping in "z".

      Thank you for pointing out the technical limitation of the microscopy. For imaging we used epifluorescence microscopy with 14x 500 nm z-steps to collect our raw data and generate a maximum intensity projection for further analysis. Because of the thin sections (10 um) used for the imaging, the overlap between cells in z is expected to be minimal. However, we cannot completely rule out misattribution raised in the comment. The method section contains this information.

      (2) Supplemental Figure 1 - Figure Supplement 2B: RCA looks significantly different when compared to v2 smFISH from the representative image, although it is written as comparable. Additionally, there is no information about RCA mentioned in the Materials and Methods section. Supplemental Figure 1 - Figure Supplement 2B: The figure label for RCA is missing.

      By comparable we are referring to the intensity rather than pattern as mentioned in the results section. We did not analyze the number of spots. It is true that the pattern of RCA signal is much sparser due to its inherent insensitivity compared with hamFISH. We thank the reviewer for identifying the lack of a methodological RCA description and have amended the manuscript to include this. We have also now amended the missing RCA label in the figure.

      (3) Figure 2C and associated supplement: The rows (each gene) are not consistent across the subpanels (i.e. they do not line up left-to-right), this makes it difficult for the reader to follow the patterns that distinguish the cell types in each subset.

      We have done this as we believe it makes for an easier interpretation of inhibitory vs excitatory clusters for the reader. However, we agree with the reviewer that one may wish to look at the dataset as a whole with a consistent gene order, and we have now provided this in the corresponding supplemental figure.  

      (4) "Consistent with previous work, most inhibitory classes are localized in the dorsal and ventral subdivisions of the MeA, whereas excitatory neurons occupy primarily the ventral MeA (Figure 2D, Figure 2 - Figure Supplement 2C, Figure 1D)". - The reference to Figure 1D seems to be an error.

      We thank the reviewer for identifying the mistake, and we have now amended it.

      (5) Supplemental Figure 2 - Figure Supplement 1, "published by Chen et al." - should have a proper reference number to be compatible with the rest of the manuscript. Also, the lack of gene info makes it difficult to understand Panel A. Finally, the text on Panel B refers to "hamMERFISH" which seems an error.

      We thank the reviewer for identifying the mistake on Panel B, it has now been amended. We have also changed the reference format. Regarding the lack of gene information in panel A, it is difficult to present all row names due to the large number of rows (>200), but this information can be found in supplemental table 2.

      (6) Supplemental Figure 2 - Figure Supplement 1: there are thin dividing lines drawn on each section, but these are not described or defined, making it difficult to understand what is being delineated.

      We thank the reviewer for identifying this omission and have now edited to figure legend to contain a description.

      (7) Page 4, "...we found 26 clusters in cells that are positive for Slc32a1 (inhibitory) or Slc17a6 (encoding Vglut2 and therefore excitatory) positive (Figure 2 - figure supplement 1A, Table S2)."

      This seems to be an error as Figure 2 - figure supplement 1A does not show this.

      We double-checked that this description describes the panel accurately.

      (8) "The clustering revealed that inhibitory and excitatory classes generally have different spatial properties (Figure 1E, left), although the salt-and-pepper, sparse nature of e10 (Nts+) cells is more similar to inhibitory cells than other excitatory classes".

      The references to Figure 1E's should be to Figure 2E.

      We thank the reviewer for identifying the mistake, and we have now amended it.

      (9) "Comparison of the proportion of all cells that are cluster X vs projection neurons labelled by CTB that are cluster X". Please explain cluster X in this context.

      We have now rephrased this sentence in the figure legend for clarity.

      (10) Figure 3 - figure supplement 3: There appears to be quite a bit of heterogeneity in the patterns of activity across clusters even within behavioral contexts (e.g. the bottom 2 animals paired with females). It might be worth commenting on (or quantifying) whether there were any evident differences in the social behaviors observed (e.g. mating or not?) in individuals demonstrating these patterns.

      We thank the reviewer for this observation. We unfortunately did not quantify the behaviors, but we agree that more work is needed to link the pattern of c-fos activity with incrementally measured behavioral variables. At least, we did not include animals that did not display the anticipated social behaviours (as described in the materials and methods) in the in situ transcriptomic profiling work.

    1. eLife Assessment

      This study provides novel and convincing evidence that both dopamine D1 and D2 expressing neurons in the nucleus accumbens shell are crucial for the expression of cue-guided action selection, a core component of decision-making. The research is systematic and rigorous in using optogenetic inhibition of either D1- or D2-expressing medium spiny neurons in the NAc shell to reveal attenuation of sensory-specific Pavlovian-Instrumental transfer, while largely sparing value-based decision on an instrumental task. The important findings in this report build on prior research and resolve some conflicts in the literature regarding decision-making.

    2. Reviewer #1 (Public review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics and the well-established behavioral paradigm outcome-specific PIT - sPIT), Octavia Soegyono and colleagues decipher the differential contribution of dopamine receptors D1 and D2 expressing-spiny projection neurons (SPNs).

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2-SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these effects were specific to stimulus-based actions, as value-based choices were left intact in all manipulations.

      This is a well-designed study and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and add to the current literature.

      Comments on revisions:

      We thank the authors for their detailed responses and for addressing our comments and concerns.

      To further improve consistency and transparency, we kindly request that the authors provide, for Supplemental Figures S1-S4, panels E (raw data for lever presses during the PIT test), the individual data points together with the corresponding statistical analyses in the figure legends.

      In addition, regarding Supplemental Figure S3, panel E, we note the absence of a PIT effect in the eYFP group under the ON condition, which appears to differ from the net response reported in the main Figure 5, panel B. Could the authors clarify this apparent discrepancy?

      We also note a discrepancy between the authors' statement in their response ("40 rats excluded based on post-mortem analyses") and the number of excluded animals reported in the Materials and Methods section, which adds up to 47. We kindly ask the authors to clarify this point for consistency.

      Finally, as a minor point, we suggest indicating the total number of animals used in the study in the Materials and Methods section.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Soegyono et a. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cue-guided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no effects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum were required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths:

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value guided action selection. The inclusion of reporter only control groups is rigorous and rules out nonspecific effects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provides a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry.

      Weaknesses:

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration for D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      Conclusions:

      The research described here was successful in providing critical new insights into the contributions of NAc D1 and D2 neurons in cue-guided action selection. The authors' data interpretation and conclusions are well reasoned and appropriate. They also provide a thoughtful discussion of study limitations and implications for future research. This research is therefore likely to have a significant impact on the field.

      Comments on revisions:

      I have reviewed the rebuttal and revised manuscript and have no remaining concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics, and the well-established behavioral paradigm outcome-specific PIT-sPIT), Octavia Soegyono and colleagues decipher the diNerential contribution of dopamine receptors D1 and D2 expressing spiny projection neurons (SPNs). 

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these eNects were specific to stimulus-based actions, as valuebased choices were left intact in all manipulations. 

      This is a well-designed study, and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and adds to the current literature.

      We thank the Reviewer for their positive assessment. 

      Reviewer 2 (Public Review):

      Summary: 

      This manuscript by Soegyono et al. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cueguided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no eNects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter-only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum was required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths: 

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value-guided action selection. The inclusion of reporter-only control groups is rigorous and rules out nonspecific eNects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provide a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry. 

      We thank the Reviewer for their positive assessment. 

      Weaknesses: 

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration of D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to the ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      We acknowledge the reviewer's valuable suggestion that demonstrating NAc-S D1- and D2-SPNs engagement in outcome-specific PIT through another technique would strengthen our optogenetic findings. Several approaches could provide this validation. Chemogenetic manipulation, as the reviewer suggested, represents one compelling option. Alternatively, immunohistochemical assessment of phosphorylated histone H3 at serine 10 (P-H3) oMers another promising avenue, given its established utility in reporting striatal SPNs plasticity in the dorsal striatum (Matamales et al., 2020). We hope to complete such an assessment in future work since it would address the limitations of previous work that relied solely on ERK1/2 phosphorylation measures in NAc-S SPNs (Laurent et al., 2014). The manuscript was modified to report these future avenues of research (page 12). 

      Regarding the null result from optical silencing of D2 terminals in the ventral pallidum, we agree with the reviewer's assessment. While we acknowledge this limitation in the current manuscript (page 13), we aim to address this gap in future studies to provide a more complete mechanistic understanding of the circuit.

      Reviewer 3 (Public Review):

      Summary:

      The authors present data demonstrating that optogenetic inhibition of either D1- or D2MSNs in the NAc Shell attenuates expression of sensory-specific PIT while largely sparing value-based decision on an instrumental task. They also provide evidence that SS-PIT depends on D1-MSN projections from the NAc-Shell to the VP, whereas projections from D2-MSNs to the VP do not contribute to SS-PIT.

      Strengths:

      This is clearly written. The evidence largely supports the authors' interpretations, and these eNects are somewhat novel, so they help advance our understanding of PIT and NAc-Shell function.

      We thank the Reviewer for their positive assessment. 

      Weaknesses:

      I think the interpretation of some of the eNects (specifically the claim that D1-MSNs do not contribute to value-based decision making) is not fully supported by the data presented.

      We appreciate the reviewer's comment regarding the marginal attenuation of valuebased choice observed following NAc-S D1-SPN silencing. While this manipulation did produce a slight reduction in choice performance, the behavior remained largely intact. We are hesitant to interpret this marginal eMect as evidence for a direct role of NAc-S D1SPNs in value-based decision-making, particularly given the substantial literature demonstrating that NAc-S manipulations typically preserve such choice behavior (Corbit et al., 2001; Corbit & Balleine, 2011; Laurent et al., 2012). Furthermore, previous work has shown that NAc-S D1 receptor blockade impairs outcome-specific PIT while leaving value-based choice unaMected (Laurent et al., 2014). We favor an alternative explanation for our observed marginal reduction. As documented in Supplemental Figure 1, viral transduction extended slightly into the nucleus accumbens core (NAc-C), a region established as critical for value-based decision-making (Corbit et al., 2001; Corbit & Balleine, 2011; Laurent et al., 2012; Parkes et al., 2015). The marginal impairment may therefore reflect inadvertent silencing of a small number of  NAc-C D1-SPNs rather than a functional contribution from NAc-S D1-SPNs. Future studies specifically targeting larger NAc-C D1-SPN populations would help clarify this possibility and provide definitive resolution of this question.

      Reviewer 1 (Recommendations for the Author):

      My main concerns and comments are listed below.

      (1) Could the authors provide the "raw" data of the PIT tests, such as PreSame vs Same vs PreDiNerent vs DiNerent? Could the authors clarify how the Net responding was calculated? Was it Same minus PreSame & DiNerent minus PreDiNerent, or was the average of PreSame and PreDiNerent used in this calculation?

      The raw data for PIT testing across all experiments are now included in the Supplemental Figures (Supplemental Figures S1E, S2E, S3E, and S4E). Baseline responding was quantified as the average number of lever presses per minute for both actions during the two-minute period (i.e., average of PreSame and PreDiMerent) preceding each stimulus presentation. This methodology has been clarified in the revised manuscript (page 7).

      (2) While both sexes are utilized in the current study, no statistical analysis is provided. Can the authors please comment on this point and provide these analyses (for both training and tests)?

      As noted in the original manuscript, the final sample sizes for female and male rats were insuMicient to provide adequate statistical power for sex-based analyses (page 15). To address this limitation, we have now cited a previous study from our laboratory (Burton et al., 2014) that conducted such analyses with suMicient power in identical behavioural tasks. That study identified only marginal sex diMerences in performance, with female rats exhibiting slightly higher magazine entry rates during Pavlovian conditioning. Importantly, no diMerences were observed in outcome-specific PIT or value-based choice performance between sexes.

      (3) Regarding Figure 1 - Anterograde tracing in D1-Cre and A2a-Cre rats (from line 976), I have one major and one minor question:

      (3.1) I do not understand the rationale of showing anterograde tracing from the Dorsal Striatum (DS) as this region is not studied in the current work. Moreover, sagittal micrographs of D1-Cre and A2a-Cre would be relevant here. Could the authors please provide these micrographs and explain the rationale for doing tracing in DS?

      We included dorsal striatum (DS) tracing data as a reference because the projection patterns of D1 and D2 SPNs in this region are well-established and extensively characterized, in contrast to the more limited literature on these cell types in the NAc-S. Regarding the comment about sagittal micrographs, we are uncertain of the specific concern as these images are presented in Figure 1B.

      If the reviewer is requesting sagittal micrographs for NAc-S anterograde tracing, we did not employ this approach because: (1) the NAc-S and ventral pallidum are anatomically adjacent regions and (2) the medial-lateral coordinates of the ventral pallidum and lateral hypothalamus do not align optimally with those of the NAc-S, limiting the utility of sagittal analysis for these projections.

      (3.2) There is no description about how the quantifications were done: manually? Automatically? What script or plugin was used? If automated, what were the thresholding conditions? How many brain sections along the anteroposterior axis? What was the density of these subpopulations? Can the authors include a methodological section to address this point?

      We apologize for the omission of quantification methods used to assess viral transduction specificity. This methodological description has now been added to the revised manuscript (page 22). Briefly, we employed a manual procedure in two sections per rat, and cell counts were completed in a defined region of interest located around the viral infusion site.

      (4) Lex A & Hauber (2008) Dopamine D1 and D2 receptors in the nucleus accumbens core and shell mediate Pavlovian-instrumental transfer. Learning & memory 15:483- 491, should be cited and discussed. It also seems that the contribution of the main dopaminergic source of the brain, the ventral tegmental area, is not cited, while it has been investigated in PIT in at least 3 studies regarding sPIT only, notably the VP-VTA pathway (Leung & Balleine 2015, accurately cited already).

      We did not include the Lex & Hauber (2008) study because its experimental design (single lever and single outcome) prevents diMerentiation between the eMects of Pavlovian stimuli on action performance (general PIT) versus action selection (outcome-specific PIT, as examined in the present study). Drawing connections between their findings and our results would require speculative interpretations regarding whether their observed eMects reflect general or outcome-specific PIT mechanisms, which could distract from the core findings reported in the article.

      Several studies examining the role of the VTA in outcome-specific PIT were referenced in the manuscript's introduction. Following the reviewer's recommendation, these references have also been incorporated into the discussion section (page 13). 

      (5) While not directly the focus of this study, it would be interesting to highlight the accumbens dissociation between General vs Specific PIT, and how the dopaminergic system (diNerentially?) influences both forms of PIT.

      We agree with the reviewer that the double dissociation between nucleus accumbens core/shell function and general/specific PIT is an interesting topic. However, the present manuscript does not examine this dissociation, the nucleus accumbens core, or general PIT. Similarly, our study does not directly investigate the dopaminergic system per se. We believe that discussing these topics would distract from our core findings and substantially increase manuscript length without contributing novel data directly relevant to these areas. 

      (6) While authors indicate that conditioned response to auditory stimuli (magazine visits) are persevered in all groups, suggesting intact sensitivity to the general motivational properties of reward-predictive stimuli (lines 344, 360), authors can't conclude about the specificity of this behavior i.e. does the subject use a mental representation of O1 when experiencing S1, leading to a magazine visits to retrieve O1 (and same for S2-O2), or not? Two food ports would be needed to address this question; also, authors should comment on the fact that competition between instrumental & pavlovian responses does not explain the deficits observed.

      We agree with the Reviewer that magazine entry data cannot be used to draw conclusions about specificity, and we do not make such claims in our manuscript. We are therefore unclear about the specific concern being raised. Following the Reviewer’s recommendation, we have commented on the fact that response competition could not explain the results obtained (page 11, see also supplemental discussion). 

      The minor comments are listed below.

      (7) A high number of rats were excluded (> 32 total), and the number of rats excluded for NAc-S D1-SPNs-VP is not indicated.

      We apologize for omitting the number of rats excluded from the experiment examining NAc-S D1-SPN projections to the ventral pallidum. This information has been added to the revised manuscript (page 22).

      (7.1) Can authors please comment on the elevated number of exclusions?

      A total of 133 rats were used across the reported experiments, with 40 rats excluded based on post-mortem analyses. This represents an attrition rate of approximately 30%, which we consider reasonable given that most animals received two separate viral infusions and two separate fiber-optic cannula implantations, and that the inclusion of both female and male rats contributed to some variability in coordinates and so targeting. 

      (7.2) Can authors please present the performance of these animals during the tasks (OFF conditions, and for control ones, both ON & OFF conditions)?

      Rats were excluded after assessing the spread of viral infusions, placement of fibre-optic cannulas and potential damage due to the surgical procedures (page 21). The requested data are presented below and plotted in the same manner as in Figures 3-6. The pattern of performance in excluded animals was highly variable. 

      Author response image 1.

       

      (8) For tracing, only males were used, and for electrophysiology, only females were used.

      (8.1) Can authors please comment on not using both sexes in these experiments? 

      We agree that equal allocation of female and male rats in the experiments presented in Figures 1-2 would have been preferable. Animal availability was the sole factor determining these allocations. Importantly, both female and male D1-Cre and A2A-Cre rats were used for the NAc-S tracing studies, and no sex diMerences were observed in the projection patterns. The article describing the two transgenic lines of rats did not report any sex diMerence (Pettibone et al., 2019). 

      (8.2) Is there evidence in the literature that the electrophysiological properties of female versus male SPNs could diNer?

      The literature indicates that there is no sex diMerence in the electrophysiological properties of NAc-S SPNs (Cao et al., 2018; Willett et al., 2016).  

      (8.3) It seems like there is a discrepancy between the number of animals used as presented in the Figure 2 legend versus what is described in the main text. In the Figure legend, I understand that 5 animals were used for D1-Cre/DIO-eNpHR3.0 validation, and 7 animals for A2a-Cre/DIO-eNpHR3.0; however, the main text indicates the use of a total of 8 animals instead of the 12 presented in the Figure legend. Can authors please address this mismatch or clarify?

      The number of rats reported in the main text and Figure 2 legend was correct. However, recordings sometimes involved multiple cells from the same animal, and this aspect of the data was incorrectly reported and generated confusion. We have clarified the numbers in both the main text and Figure 2 legend to distinguish between animal counts and cell counts. 

      (9) Overall, in the study, have the authors checked for outliers?

      Performance across all training and testing stages was inspected to identify potential behavioral outliers in each experiment. Abnormal performance during a single session within a multi-session stage was not considered suMicient grounds for outlier designation. Based on these criteria, no subjects remaining after post-mortem analyses exhibited performance patterns warranting exclusion through statistical outlier analysis. However, we have conducted the specific analyses requested by the Reviewer, as described below. 

      (9.1) In Figure 3, it seems that one female in the eYFP group, in the OFF situation, for the diNerent condition, has a higher level of responding than the others. Can authors please confirm or refute this visual observation with the appropriate statistical analysis?

      Statistical analysis (z-score) confirmed the reviewer's observation regarding responding of the diMerent action in the OFF condition for this subject (|z| = 2.58). Similar extreme responding was observed in the ON condition (|z| = 2.03). Analyzing responding on the diMerent action in isolation is not informative in the context of outcome-specific PIT. Additional analyses revealed |z| < 2 when examining the magnitude of choice discrimination in outcome-specific PIT (i.e., net same versus net diMerent responding) in both ON and OFF conditions. Furthermore, this subject showed |z| < 2 across all other experimental stages. Based on these analyses, we conclude that the subject should be kept in all analyses. 

      (9.2) In Figure 5, it seems that one male, in the ON situation, in the diNerent condition, has a quite higher level of responding - is this subject an outlier? If so, how does it aNect the statistical analysis after being removed? And who is this subject in the OFF condition?

      The reviewer has identified two diMerent male rats infused with the eNpHR3.0 virus and has asked closer examination of their performance.

      The first rat showed outlier-level responding on the diMerent action in the ON condition (|z| = 2.89) but normal responding for all other measures across LED conditions (|z| < 2). Additional analyses revealed |z| = 2.55 when examining choice discrimination magnitude in outcome-specific PIT during the ON condition but not during the OFF condition (|z| = 0.62). This subject exhibited |z| < 2 across all other experimental stages.

      The second rat showed outlier-level responding on the same action in the OFF condition (|z| = 2.02) but normal responding for all other measures across LED conditions (|z| < 2). Additional analyses revealed |z| = 2.12 when examining choice discrimination magnitude in outcome-specific PIT during the OFF condition but not during the ON condition (|z| = 0.67). This subject also exhibited |z| < 2 across all other experimental stages.

      We excluded these two subjects and conducted the same analyses as described in the original manuscript. Baseline responding did not diMer between groups (p = 0.14), allowing to look at the net eMect of the stimuli. Overall lever presses were greater in the eYFP rats (Group: F(1,16) = 6.08, p < 0.05; η<sup>2</sup> = 0.28) and were reduced by LED activation (LED: F(1,16) = 9.52, p < 0.01; η<sup>2</sup> = 0.44) and this reduction depended on the group considered (Group x LED: F(1,16) = 12.125, p < 0.001; η<sup>2</sup> = 0.43). Lever press rates were higher on the action earning the same outcome as the stimuli compared to the action earning the diMerent outcome (Lever: F(1,16)= 49.32; η<sup>2</sup> = 0.76; p < 0.001), regardless of group (Group x Lever: p = 0.14). There was a Lever by LED light condition interaction (Lever x LED: F(1,16)= 5.25; η<sup>2</sup> = 0.24; p < 0.05) but no an interaction between group, LED light condition, and Lever during the presentation of the predictive stimuli (p = 0.10). Given the significant Group x LED and Lever x LED interactions, additional analyses were conducted to determine the source of these interactions. In eYFP rats, LED activation had no eMect (LED: p = 0.70) and lever presses were greater on the same action (Lever: (F(1,9) = 23.94, p < 0.001; η<sup>2</sup> = 0.79) regardless of LED condition (LED x Lever: p = 0.72). By contrast, in eNpHR3.0 rats, lever presses were reduced by LED activation (LED: F(1,9) = 23.97, p < 0.001; η<sup>2</sup> = 0.73), were greater on the same action (Lever: F(1,9) = 16.920, p < 0.001; η<sup>2</sup> = 0.65) and the two factors interacted (LED x Lever: F(1,9) = 9.12, p < 0.01; η<sup>2</sup> = 0.50). These rats demonstrated outcome-specific PIT in the OFF condition (F(1,9) = 27.26, p < 0.001; η<sup>2</sup> = 0.75) but not in the ON condition (p = 0.08).

      Overall, excluding these two rats altered the statistical analyses, but both the original and revised analyses yielded the same outcome: silencing the NAc-S D1-SPN to VP pathway disrupted PIT. More importantly, we do not believe there are suMicient grounds to exclude the two rats identified by the reviewer. These animals did not display outlier-level responding across training stages or during the choice test. Their potential classification as outliers would be based on responding during only one LED condition and not the other, with notably opposite patterns between the two rats despite belonging to the same experimental group. 

      (10) I think it would be appreciable if in the cartoons from Figure 5.A and 6.A, the SPNs neurons were color-coded as in the results (test plots) and the supplementary figures (histological color-coding), such as D1- in blue & D2-SPNs in red.

      Our current color-coding system uses blue for D1-SPNs transduced with eNpHR3.0 and red for D2-SPNs transduced with eNpHR3.0. The D1-SPNs and D2-SPNs shown in Figures 5A and 6A represent cells transduced with either eYFP (control) or eNpHR3.0 virus and therefore cannot be assigned the blue or red color, which is reserved for eNpHR3.0transduced cells specifically. The micrographs in the Supplemental Figures maintain consistency with the color-coding established in the main figures.

      (11) As there are (relatively small) variations in the control performance in term of Net responding (from ~3 to ~7 responses per min), I wonder what would be the result of pooling eYFP groups from the two first experiments (Figures 3 & 4) and from the two last ones (Figures 5 & 6) - would the same statically results stand or vary (as eYFP vs D1-Cre vs A2a-Cre rats)? In particular for Figures 3 & 4, with and without the potential outlier, if it's indeed an outlier.

      We considered the Reviewer’s recommendation but do not believe the requested analysis is appropriate. The Reviewer is requesting the pooling of data from subjects of distinct transgenic strains (D1-Cre and A2A-Cre rats) that underwent surgical and behavioral procedures at diMerent time points, sometimes months apart. Each experiment was designed with necessary controls to enable adequate statistical analyses for testing our specific hypotheses. 

      (12) Presence of cameras in operant cages is mentioned in methods, but no data is presented regarding recordings, though authors mention that they allow for real-time observations of behavior. I suggest removing "to record" or adding a statement about the fact that no videos were recorded or used in the present study.

      We have removed “to record” from the manuscript (page 18). 

      (13) In all supplementary Figures, "F" is wrongly indicated as "E".

      We thank the Reviewer for reporting these errors, which have been corrected. 

      (14) While the authors acknowledge that the eNicacy of optogenetic inhibition of terminals is questionable, I think that more details are required to address this point in the discussion (existing literature?). Maybe, the combination of an anterograde tracer from SPNs to VP, to label VP neurons (to facilitate patching these neurons), and the Credependent inhibitory opsin in the NAc Shell, with optogenetic illumination at the level of the VP, along with electrophysiological recordings of VP neurons, could help address this question but may, reasonably, seem challenging technically.

      Our manuscript does not state that optogenetic inhibition of terminals is questionable. It acknowledges that we do not provide any evidence about the eMicacy of the approach. Regardless, we have provided additional details and suggestions to address this lack of evidence (page 13). 

      (15) A nice addition could be an illustration of the proposed model (from line 374), but it may be unnecessary.

      We have carefully considered the reviewer's recommendation. The proposed model is detailed in three published articles, including one that is freely accessible, which we have cited when presenting the model in our manuscript (page 14). This reference should provide interested readers with easy access to a comprehensive illustration of the model.

      Reviewer 2 (Recommendations for the Author):

      As noted in my public comments, this is a truly excellent and compelling study. I have only a few minor comments.

      (1) I could not find the coordinates/parameters for the dorsal striatal AAV injections for that component of the tract tracing experiment.

      We apologize for this omission, which has now been corrected (page 16). 

      (2) Please add the final group sizes to the figure captions.

      We followed the Reviewer’s recommendation and added group sizes in the main figure captions. 

      (3) The discussion of group exclusions (p 21 line 637) seems to accidentally omit (n = X) the number of NAc-S D1-SPNs-VP mice excluded.

      We apologize for this omission, which has now been corrected (page 22). 

      (4) There were some labeling issues in the supplementary figures (perhaps elsewhere, too). Specifically, panel E was listed twice (once for F) in captions.

      We apologize for this error, which has now been corrected.  

      (5) Inspection of the magazine entry data from PIT tests suggests that the optogenetic manipulations may have had some eNects on this behavior and would encourage the authors to probe further. There was a significant group diNerence for D1-SPN inhibition and a marginal group eNect for D2-SPNs. The fact that these eNects were in opposite directions is intriguing, although not easily interpreted based on the canonical D1/D2 model. Of course, the eNects are not specific to the light-on trials, but this could be due to carryover into light-oN trials. An analysis of trial-order eNects seems crucial for interpreting these eNects. One might also consider normalizing for pre-test baseline performance. Response rates during Pavlovian conditioning seem to suggest that D2eNpHR mice showed slightly higher conditioned responding during training, which contrasts with their low entry rates at test. I don't see any of this as problematic -- but more should be done to interpret these findings.

      We thank the reviewer for raising this interesting point regarding magazine entry rates. Since these data are presented in the Supplemental Figures, we have added a section in the Supplemental Material file that elaborates on these findings. This section does not address trial order eMects, as trial order was fully counterbalanced in our experiments and the relevant statistical analyses would lack adequate power. Baseline normalization was not conducted because the reviewer's suggestion was based on their assumption that eNpHR3.0 rats in the D2-SPNs experiment showed slightly higher magazine entries during Pavlovian training. However, this was not the case. In fact, like the eNpHR3.0 rats in the D1-SPNs experiment, they tended to display lower magazine entries during training. The added section therefore focuses on the potential role of response competition during outcome-specific PIT tests. Although we concluded that response competition cannot explain our findings, we believe it may complicate interpretation of magazine entry behavior. Thus, we recommend that future studies examine the role of NAc-S SPNs using purely Pavlovian tasks. It is worth nothing that we have recently completed experiments (unpublished) examining NAc-S D1- and D2-SPN silencing during stimulus presentation in a Pavlovian task identical to the one used here. Silencing of either SPN population had no eMect on magazine entry behavior.

      Reviewer 3 (Recommendations for the Author):

      Broad comments:

      Throughout the manuscript, the authors draw parallels between the eNect established via pharmacological manipulations and those shown here with optogenetic manipulation. I understand using the pharmacological data to launch this investigation, but these two procedures address very diNerent physiological questions. In the case of a pharmacological manipulation, the targets are receptors, wherever they are expressed, and in the case of D2 receptors, this means altering function in both pre-synaptically expressed autoreceptors and post-synaptically expressed D2 MSN receptors. In the case of an optogenetic approach, the target is a specific cell population with a high degree of temporal control. So I would just caution against comparing results from these types of studies too closely.

      Related to this point is the consideration of the physiological relevance of the manipulation. Under normal conditions, dopamine acts at D1-like receptors to increase the probability of cell firing via Ga signaling. In contrast, dopamine binding of D2-like receptors decreases the cell's firing probability (signaling via Gi/o). Thus, shunting D1MSN activation provides a clear impression of the role of these cells and, putatively, the role of dopamine acting on these cells. However, inhibiting D2-MSNs more closely mimics these cells' response to dopamine (though optogenetic manipulations are likely far more impactful than Gi signaling). All this is to say that when we consider the results presented here in Experiment 2, it might suggest that during PIT testing, normal performance may require a halting of DA release onto D2-MSNs. This is highly speculative, of course, just a thought worth considering.

      We agree with the comments made by the Reviewer, and the original manuscript included statements acknowledging that pharmacological approaches are limited in the capacity to inform about the function of NAc-S SPNs (pages 4 and 9). As noted by the Reviewer, these limitations are especially salient when considering NAc-S D2-SPNs. Based on the Reviewer’s comment, we have modified our discussion to further underscore these limitations (page 12). Finally, we agree with the suggestion that PIT may require a halting of DA release onto D2-SPNs. This is consistent with the model presented, whereby D2-SPNs function is required to trigger enkephalin release (page 13).     

      Section-Specific Comments and Questions:

      Results:

      Anterograde tracing and ex vivo cell recordings in D1 Cre and A2a Cre rats: Why are there no statistics reported for the e-phys data in this section? Was this merely a qualitative demonstration? I realize that the A2a-Cre condition only shows 3 recordings, so I appreciate the limitations in analyzing the data presented.

      The reviewer is correct that we initially intended to provide a qualitative demonstration. However, we have now included statistical analyses for the ex vivo recordings. It is important to note that there were at least 5 recordings per condition, though overlapping data points may give the impression of fewer recordings in certain conditions. We have provided the exact number of recordings in both the main text (page 5) and figure legend. 

      What does trial by trial analysis look like, because in addition to the eNects of extinction, do you know if the responsiveness of the opsin to light stimulation is altered after repeated exposures, or whether the cells themselves become compromised in any way with repeated light-inhibition, particularly given the relatively long 2m duration of the trial.

      The Reviewer raises an interesting point, and we provide complete trial-by-trial data for each experiment below. As identified by the Reviewer, there is some evidence for extinction, although it remained modest. Importantly, the data suggest that light stimulation did not aMect the physiology of the targeted cells. In eNpHR3.0 rats, performance across OFF trials remained stable (both for Same and DiMerent) even though they were preceded by ON trials, indicating no carryover eMects from optical stimulation.

      Author response image 2.

       

      The statistics for the choice test are not reported for eNpHR-D1-Cre rats, but do show a weakening of the instrumental devaluation eNect "Group x Lever x LED: F1,18 = 10.04, p < 0.01, = 0.36". The post hoc comparisons showed that all groups showed devaluation, but it is evident that there is a weakening of this eNect when the LED was on (η<sup>2</sup> = 0.41) vs oN (η<sup>2</sup> = 0.78), so I think the authors should soften the claim that NAcS-D1s are not involved in value-based decision-making. (Also, there is a typo in the legend in Figure S1, where the caption for panel "F" is listed as "E".) I also think that this could be potentially interesting in light of the fact that with circuit manipulation, this same weakening of the instrumental devaluation eNect was not observed. To me, this suggests that D1-NAcS that project to a diNerent region (not VP) contribute to value-based decision making.

      This comment overlaps with one made in the Public Review, for which we have already provided a response. Given its importance, we have added a section addressing this point in the supplemental discussion of the Supplementary Material file, which aligns with the location of the relevant data. The caption labelling error has been corrected.

      Materials and Methods:

      Subjects:

      Were these heterozygous or homozygous rats? If hetero, what rats were used for crossbreeding (sex, strain, and vendor)? Was genotyping done by the lab or outsourced to commercial services? If genotyping was done within the lab, please provide a brief description of the protocol used. How was food restriction established and maintained (i.e., how many days to bring weights down, and was maintenance achieved by rationing or by limiting ad lib access to food for some period in the day)?

      The information requested by the Reviewer have been added to the subjects section (pages 15-16).  

      Were rats pair/group housed after implantation of optic fibers?

      We have clarified that rats were group houses throughout (see subjects section; pages 15-16). 

      Behavioral Procedures:

      How long did each 0.2ml sucrose infusion take? For pellets, for each US delivery, was it a single pellet or two in quick succession?

      We have modified the method section to indicate that the sucrose was delivered across 2 seconds and that a single pellet was provided (page 17). 

      The CS to ITI duration ratio is quite low. Is there a reason such a short ratio was used in training?

      These parameters are those used in all our previous experiments on outcome-specific PIT. There is no specific reason for using such a ratio, except that it shortens the length of the training session. 

      Relative to the end of training, when were the optical implantation surgeries conducted, and how much recovery time was given before initiating reminder training and testing?

      Fibre-optic implantation was conducted 3-4 days after training and another 3-4 days were given for recovery. This has been clarified in the Materials and methods section (pages 15-16).

      I think a diagram or schematic showing the timeline for surgeries, training, and testing would be helpful to the audience.

      We opted for a text-based experimental timeline rather than a diagram due to slight temporal variations across experiments (page 15).

      On trials, when the LED was on, was light delivered continuously or pulsed? Do these opto-receptors 'bleach' within such a long window?

      We apologize for the lack of clarity; the light was delivered continuously. We have modified the manuscript (pages 6 and 19) and figure legend accordingly. The postmortem analysis did not provide evidence for photobleaching (Supplemental Figures) and as noted above, the behavioural results do not indicate any negative physiological impact on cell function.  

      Immunofluorescence: The blocking solution used during IHC is described as "NHS"; is this normal horse serum?

      The Reviewer is correct; NHS stands for normal horse serum. This has been added (page 21). 

      Microscopy and imaging:

      For the description of rats excluded due to placement or viral spread problems, an n=X is listed for the NAc S D1 SPNs --> VP silencing group. Is this a typo, or was that meant to read as n=0? Also, was there a major sex diNerence in the attrition rate? If so, I think reporting the sex of the lost subjects might be beneficial to the scientific community, as it might reflect a need for better guidance on sex-specific coordinates for targeting small nuclei.

      We apologize for the error regarding the number of excluded animals. This error has been corrected (page 23). There were no major sex diMerences in the attrition rate. The manuscript has been updated to provide information about the sex of excluded animals (page 23). 

      References

      Cao, J., Willett, J. A., Dorris, D. M., & Meitzen, J. (2018). Sex DiMerences in Medium Spiny Neuron Excitability and Glutamatergic Synaptic Input: Heterogeneity Across Striatal Regions and Evidence for Estradiol-Dependent Sexual DiMerentiation. Front Endocrinol (Lausanne), 9, 173. https://doi.org/10.3389/fendo.2018.00173

      Corbit, L. H., Muir, J. L., Balleine, B. W., & Balleine, B. W. (2001). The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. J Neurosci, 21(9), 3251-3260. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=11312 310&retmode=ref&cmd=prlinks

      Corbit, L. H., & Balleine, B. W. (2011). The general and outcome-specific forms of Pavlovian-instrumental transfer are diMerentially mediated by the nucleus accumbens core and shell. J Neurosci, 31(33), 11786-11794. https://doi.org/10.1523/JNEUROSCI.2711-11.2011

      Laurent, V., Bertran-Gonzalez, J., Chieng, B. C., & Balleine, B. W. (2014). δ-Opioid and Dopaminergic Processes in Accumbens Shell Modulate the Cholinergic Control of Predictive Learning and Choice. J Neurosci, 34(4), 1358-1369. https://doi.org/10.1523/JNEUROSCI.4592-13.2014

      Laurent, V., Leung, B., Maidment, N., & Balleine, B. W. (2012). μ- and δ-opioid-related processes in the accumbens core and shell diMerentially mediate the influence of reward-guided and stimulus-guided decisions on choice. J Neurosci, 32(5), 1875-1883. https://doi.org/10.1523/JNEUROSCI.4688-11.2012

      Matamales, M., McGovern, A. E., Mi, J. D., Mazzone, S. B., Balleine, B. W., & BertranGonzalez, J. (2020). Local D2- to D1-neuron transmodulation updates goal-directed learning in the striatum. Science, 367(6477), 549-555. https://doi.org/10.1126/science.aaz5751

      Parkes, S. L., Bradfield, L. A., & Balleine, B. W. (2015). Interaction of insular cortex and ventral striatum mediates the eMect of incentive memory on choice between goaldirected actions. J Neurosci, 35(16), 6464-6471. https://doi.org/10.1523/JNEUROSCI.4153-14.2015

      Pettibone, J. R., Yu, J. Y., Derman, R. C., Faust, T. W., Hughes, E. D., Filipiak, W. E., Saunders, T. L., Ferrario, C. R., & Berke, J. D. (2019). Knock-In Rat Lines with Cre Recombinase at the Dopamine D1 and Adenosine 2a Receptor Loci. eNeuro, 6(5). https://doi.org/10.1523/ENEURO.0163-19.2019

      Willett, J. A., Will, T., Hauser, C. A., Dorris, D. M., Cao, J., & Meitzen, J. (2016). No Evidence for Sex DiMerences in the Electrophysiological Properties and Excitatory Synaptic Input onto Nucleus Accumbens Shell Medium Spiny Neurons. eNeuro, 3(1), ENEURO.0147-15.2016. https://doi.org/10.1523/ENEURO.0147-15.2016

    1. eLife Assessment

      This study provides novel and convincing evidence that both dopamine D1 and D2 expressing neurons in the nucleus accumbens shell are crucial for the expression of cue-guided action selection, a core component of decision-making. The research is systematic and rigorous in using optogenetic inhibition of either D1- or D2-expressing medium spiny neurons in the NAc shell to reveal attenuation of sensory-specific Pavlovian-Instrumental transfer, while largely sparing value-based decision on an instrumental task. The important findings in this report build on prior research and resolve some conflicts in the literature regarding decision making.

    2. Reviewer #1 (Public review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics, and the well-established behavioral paradigm outcome-specific PIT-sPIT), Octavia Soegyono and colleagues decipher the differential contribution of dopamine receptors D1 and D2 expressing spiny projection neurons (SPNs).

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2-SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these effects were specific to stimulus-based actions, as value-based choices were left intact in all manipulations.

      This is a well-designed study, and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and adds to the current literature.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Soegyono et al. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cue-guided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no effects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter-only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum was required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths:

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value-guided action selection. The inclusion of reporter-only control groups is rigorous and rules out nonspecific effects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provide a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry.

      Weaknesses:

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration of D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to the ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present data demonstrating that optogenetic inhibition of either D1- or D2-MSNs in the NAc Shell attenuates expression of sensory-specific PIT while largely sparing value-based decision on an instrumental task. They also provide evidence that SS-PIT depends on D1-MSN projections from the NAc-Shell to the VP, whereas projections from D2-MSNs to the VP do not contribute to SS-PIT.

      Strengths:

      This is clearly written. The evidence largely supports the authors' interpretations, and these effects are somewhat novel, so they help advance our understanding of PIT and NAc-Shell function.

      Weaknesses:

      I think the interpretation of some of the effects (specifically the claim that D1-MSNs do not contribute to value-based decision making) is not fully supported by the data presented.

    5. Author response:

      Reviewer #1 (Public review):

      In the current article, Octavia Soegyono and colleagues study "The influence of nucleus accumbens shell D1 and D2 neurons on outcome-specific Pavlovian instrumental transfer", building on extensive findings from the same lab. While there is a consensus about the specific involvement of the Shell part of the Nucleus Accumbens (NAc) in specific stimulus-based actions in choice settings (and not in General Pavlovian instrumental transfer - gPIT, as opposed to the Core part of the NAc), mechanisms at the cellular and circuitry levels remain to be explored. In the present work, using sophisticated methods (rat Cre-transgenic lines from both sexes, optogenetics, and the well-established behavioral paradigm outcome-specific PIT-sPIT), Octavia Soegyono and colleagues decipher the differential contribution of dopamine receptors D1 and D2 expressing spiny projection neurons (SPNs).

      After validating the viral strategy and the specificity of the targeting (immunochemistry and electrophysiology), the authors demonstrate that while both NAc Shell D1- and D2-SPNs participate in mediating sPIT, NAc Shell D1-SPNs projections to the Ventral Pallidum (VP, previously demonstrated as crucial for sPIT), but not D2-SPNs, mediates sPIT. They also show that these effects were specific to stimulus-based actions, as value-based choices were left intact in all manipulations.

      This is a well-designed study, and the results are well supported by the experimental evidence. The paper is extremely pleasant to read and adds to the current literature.

      We thank the Reviewer for their positive assessment.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Soegyono et al. describes a series of experiments designed to probe the involvement of dopamine D1 and D2 neurons within the nucleus accumbens shell in outcome-specific Pavlovian-instrumental transfer (osPIT), a well-controlled assay of cue-guided action selection based on congruent outcome associations. They used an optogenetic approach to phasically silence NAc shell D1 (D1-Cre mice) or D2 (A2a-Cre mice) neurons during a subset of osPIT trials. Both manipulations disrupted cue-guided action selection but had no effects on negative control measures/tasks (concomitant approach behavior, separate valued guided choice task), nor were any osPIT impairments found in reporter-only control groups. Separate experiments revealed that selective inhibition of NAc shell D1 but not D2 inputs to ventral pallidum was required for osPIT expression, thereby advancing understanding of the basal ganglia circuitry underpinning this important aspect of decision making.

      Strengths:

      The combinatorial viral and optogenetic approaches used here were convincingly validated through anatomical tract-tracing and ex vivo electrophysiology. The behavioral assays are sophisticated and well-controlled to parse cue and value-guided action selection. The inclusion of reporter-only control groups is rigorous and rules out nonspecific effects of the light manipulation. The findings are novel and address a critical question in the literature. Prior work using less decisive methods had implicated NAc shell D1 neurons in osPIT but suggested that D2 neurons may not be involved. The optogenetic manipulations used in the current study provide a more direct test of their involvement and convincingly demonstrate that both populations play an important role. Prior work had also implicated NAc shell connections to ventral pallidum in osPIT, but the current study reveals the selective involvement of D1 but not D2 neurons in this circuit. The authors do a good job of discussing their findings, including their nuanced interpretation that NAc shell D2 neurons may contribute to osPIT through their local regulation of NAc shell microcircuitry.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      The current study exclusively used an optogenetic approach to probe the function of D1 and D2 NAc shell neurons. Providing a complementary assessment with chemogenetics or other appropriate methods would strengthen conclusions, particularly the novel demonstration of D2 NAc shell involvement. Likewise, the null result of optically inhibiting D2 inputs to the ventral pallidum leaves open the possibility that a more complete or sustained disruption of this pathway may have impaired osPIT.

      We acknowledge the reviewer's valuable suggestion that demonstrating NAc-S D1- and D2-SPN engagement in outcome-specific PIT through another technique would strengthen our optogenetic findings. Several approaches could provide this validation. Chemogenetic manipulation, as the reviewer suggested, represents one compelling option. Alternatively, immunohistochemical assessment of phosphorylated histone H3 at serine 10 (P-H3) offers another promising avenue, given its established utility in reporting striatal SPN plasticity in the dorsal striatum (Matamales et al., 2020). We hope to complete such an assessment in future work since it would address the limitations of previous work that relied solely on ERK1/2 phosphorylation measures in NAc-S SPNs (Laurent et al., 2014).

      Regarding the null result from optical silencing of D2 terminals in the ventral pallidum, we agree with the reviewer's assessment. While we acknowledge this limitation in the current manuscript (see discussion), we aim to address this gap in future studies to provide a more complete mechanistic understanding of the circuit.

      Reviewer #3 (Public review):

      Summary:

      The authors present data demonstrating that optogenetic inhibition of either D1- or D2-MSNs in the NAc Shell attenuates expression of sensory-specific PIT while largely sparing value-based decision on an instrumental task. They also provide evidence that SS-PIT depends on D1-MSN projections from the NAc-Shell to the VP, whereas projections from D2-MSNs to the VP do not contribute to SS-PIT.

      Strengths:

      This is clearly written. The evidence largely supports the authors' interpretations, and these effects are somewhat novel, so they help advance our understanding of PIT and NAc-Shell function.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      I think the interpretation of some of the effects (specifically the claim that D1-MSNs do not contribute to value-based decision making) is not fully supported by the data presented.

      We appreciate the reviewer's comment regarding the marginal attenuation of value-based choice observed following NAc-S D1-SPN silencing. While this manipulation did produce a slight reduction in choice performance, the behavior remained largely intact. We are hesitant to interpret this marginal effect as evidence for a direct role of NAc-S D1-SPNs in value-based decision-making, particularly given the substantial literature demonstrating that NAc-S manipulations typically preserve such choice behavior (Corbit & Balleine, 2011; Corbit et al., 2001; Laurent et al., 2012). Notably, previous work has shown that NAc-S D1 receptor blockade impairs outcome-specific PIT while leaving value-based choice unaffected (Laurent et al., 2014). We favor an alternative explanation for our observed marginal reduction. As documented in Supplemental Figure 1, viral transduction extended slightly into the nucleus accumbens core (NAc-C), a region established as critical for value-based decision-making (Corbit & Balleine, 2011; Corbit et al., 2001; Laurent et al., 2012). The marginal impairment may therefore reflect inadvertent silencing of a small NAc-C D1-SPN population rather than a functional contribution from NAc-S D1-SPNs. Future studies specifically targeting larger NAc-C D1-SPN populations would help clarify this possibility and provide definitive resolution of this question.

    1. eLife Assessment

      This useful study reports analyses of Neuropixel recordings in the medial prefrontal cortex and hippocampus of rats in a spatial navigation trial, focusing on classifying prefrontal neurons based on SWR modulation and anatomical location. Reviewers were unconvinced by the presented evidence for the claim that distinct populations of mPFC neurons participate in non-local ensemble representations during SWR and non-SWR periods, and were unconvinced by the presented evidence for a previously unrecognized anatomical distinction between these populations. Further analyses might strengthen the incomplete evidence for some conclusions, and some of the strong claims of the paper should likely be moderated.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used high-density probe recordings in the medial prefrontal cortex (PFC) and hippocampus during a rodent spatial memory task to examine functional sub-populations of PFC neurons that are modulated vs. unmodulated by hippocampal sharp-wave ripples (SWRs), an important physiological biomarker that is thought to have role in mediating information transfer across hippocampal-cortical networks for memory processes. SWRs are associated with reactivation of representations of previous experiences, and associated reactivation in hippocampal and cortical regions have been proposed to have a role in memory formation, retrieval, planning, and memory-guided behavior. This study focuses of awake SWRs that are prevalent during immobility periods during pauses in behavior. Previous studies have reported strong modulation of a subset of prefrontal neurons during hippocampal SWRs, with some studies reporting prefrontal reactivation during SWRs that have a role in spatial memory processes. The study seeks to extend these findings by examining activity of SWR-modulated vs. unmodulated neurons across PFC sub-regions, and whether there is a functional distinction between these two kinds of neuronal populations with respect to representing spatial information and supporting memory-guided decision making.

      Strengths:

      The major strength of the study is the use of Neuropixels 1.0 probes to monitor activity throughput the dorsal-ventral extent of the rodent medial prefrontal cortex, permitting an investigation of functional distinction in neuronal populations across PFC sub-regions. They are able to show that SWR-unmodulated neurons, in addition to having stronger spatial tuning than SWR-modulated neurons as previously reported, also show stronger directional selectivity, and theta-cycle skipping properties.

      Weaknesses:

      (1) The title and abstract have been updated to reflect the updated interpretation that prefrontal neurons are involved in spatial tuning and signaling upcoming choice independently from hippocampal SWRs, implying the negative that these functions do not happen during SWRs. The evidence presented, however, is lacking and the analyses has key limitations that preclude such a conclusion. First, the fact that prefrontal neurons decode past and future choices independently of the hippocampus, not just hippocampal SWRs, is well-established (for e.g., Baeg et al., 2003, 10.1016/s0896-6273(03)00597-x). Second, the statement that prefrontal neurons are involved in spatial tuning independently from SWRs is inconsistent, since spatial tuning is assessed during exploratory behaviors that are not associated with SWRs. Apart from showing that non-local decoding occurs in prefrontal cortex outside SWR time periods, which is already established, the conclusion needs evidence this does not occur during SWR time periods, which is not presented.

      (2) The results show that SWR-modulated prefrontal neurons are more linked to hippocampal non-local representations, whereas SWR-unmodulated neurons encode upcoming choice independently of SWRs. This is logical, and implies that SWR-modulated prefrontal neurons are involved in non-local decoding during hippocampal non-local representations. This hints at potentially multiple mechanisms, one involving independent prefrontal non-local decoding, and another involving prefrontal and hippocampal non-local decoding.

      (3) The analyses have key limitations. The Methods section notes that decoding was performed in 50ms bins, periods with running speed less than 15cm/s were excluded, then decoded probabilities summed for each maze segment, followed by grouping probabilities together for local and non-local decoding. This implies that decoding segments can span entire maze segments or long time periods - this needs to be clarified and quantified. When examining time-locking of decoding segments to hippocampal SWRs, only non-local segments that occurred within 2 secs of SWRs were used. This raises several concerns. First, prefrontal modulation by hippocampal SWRs lasts primarily <500ms, so a 2sec temporal proximity will lead to non-SWR modulation periods being included in the analyses. In addition, even for decoding segments that may be in close temporal proximity, these can be very long, based on the analyses description. This can lead to spurious results. Second, if only running speeds >15cm/s were included, immobility periods are necessarily being excluded, which is when SWRs occur. So, this analysis cannot be used to investigate decoding during SWRs; rather, a direct approach of extracting prefrontal activity during SWRs and then decoding this activity is required.

    3. Reviewer #2 (Public review):

      Summary:

      This work by den Bakker and Kloosterman contributes to the vast body of research exploring the dynamics governing the communication between the hippocampus (HPC) and the medial prefrontal cortex (mPFC) during spatial learning and navigation. Previous research showed that population activity of mPFC neurons is replayed during HPC sharp-wave ripple events (SWRs), which may therefore correspond to privileged windows for the transfer of learned navigation information from the HPC, where initial learning occurs, to the mPFC, which is thought to store this information long term. Indeed, it was also previously shown that the activity of mPFC neurons contains task-related information that can inform about the location of an animal in a maze, which can predict the animals' navigational choices. Here, the authors aim to show that the mPFC neurons that are modulated by HPC activity (SWRs and theta rhythms) are distinct from those "encoding" spatial information. This result could suggest that the integration of spatial information originating from the HPC within the mPFC may require the cooperation of separate sets of neurons.

      This observation may be useful to further extend our understanding of the dynamics regulating the exchange of information between the HPC and mPFC during learning. However, my understanding is that this finding is mainly based upon a negative result, which cannot be statistically proven by the failure to reject the null hypothesis. Moreover, in my reading, the rest of the paper mainly replicates phenomena that have already been described, with the original reports not correctly cited. My opinion is that the novel elements should be precisely identified and discussed, while the current phrasing in the manuscript, in most cases, leads readers to think that these results are new. Detailed comments are provided below.

      Major concerns:

      ORIGINAL COMMENT: (1) The main claim of the manuscript is that the neurons involved in predicting upcoming choices are not the neurons modulated by the HPC. This is based upon the evidence provided in Figure 5, which is a negative result that the authors employ to claim that predictive non-local representations in the mPFC are not linked to hippocampal SWRs and theta phase. However, it is important to remember that in a statistical test, the failure to reject the null hypothesis does not prove that the null hypothesis is true. Since this claim is so central in this work, the authors should use appropriate statistics to demonstrate that the null hypothesis is true. This can be accomplished by showing that there is no effect above some size that is so small that it would make the effect meaningless (see https://doi.org/10.1177/070674370304801108).

      AUTHOR RESPONSE: We would like to highlight a few important points here. (1) We indeed do not intend to claim that the SWR-modulated neurons are not at all involved in predicting upcoming choice, just that the SWR-unmodulated neurons may play a larger role. We have rephrased the title and abstract to make this clearer.

      REVIEWER COMMENT: The title has been rephrased but still conveys the same substantive claim. The abstract sentence also does not clearly state what was found. Using "independently" in the new title continues to imply that SWR modulation and prediction of upcoming choices are separate phenomena. By contrast, in your response here in the rebuttall you state only that "SWR-unmodulated neurons may play a larger role," which is a much more tempered claim than what the manuscript currently argues. Why is this clarification not adopted in the article? Moreover, the main text continues to use the same arguments as before; beyond the cosmetic changes of title and abstract, the claim itself has not materially changed.

      AUTHOR RESPONSE: (2) The hypothesis that we put forward is based not only on a negative effect, but on the findings that: the SWR-unmodulated neurons show higher spatial tuning (Fig 3b), more directional selectivity (Fig 3d), more frequent encoding of the upcoming choice at the choice point (new analysis, added in Fig 4d), and higher spike rates during the representations of the upcoming choice (Fig 5b). This is further highlighted by the fact that the representations of upcoming choice in the PFC are not time locked to SWRs (whereas the hippocampal representations of upcoming choice are; see Fig 5a and Fig 6a), and not time-locked to hippocampal theta phase (whereas the hippocampal representations are; see Fig 5c and Fig 6c). Finally, the representations of upcoming and alternative choices in the PFC do not show a large overlap in time with the representations in the hippocampus (see updated Fig 4e were we added a statistical test to show the likelihood of the overlap of decoded timepoints). All these results together lead us to hypothesize that SWR-modulation is not the driving factor behind non-local decoding in the PFC.

      REVIEWER COMMENT: I do not see how these precisions address my remark. The main claim in the title used to be "Neurons in the medial prefrontal cortex that are not modulated by hippocampal sharp-wave ripples are involved in spatial tuning and signaling upcoming choice." It is now "Neurons in the medial prefrontal cortex are involved in spatial tuning and signaling upcoming choice independently from hippocampal sharp-wave ripples." The substance has not changed. This specific claim is supported solely by Figure 5.

      The other analyses cited describe functional characteristics of SWR-unmodulated neurons but, unless linked by explicit new analyses, do not substantiate independence/orthogonality between SWR modulation and non-local decoding in PFC. If there is an analysis that makes this link explicit, it should be clearly presented; as it stands, I cannot find an explanation in the manuscript for why "all these results together" justify the conclusion that "All these results together lead us to hypothesize that SWR-modulation is not the driving factor behind non-local decoding in the PFC". Also: is the main result of this work a "hypothesis"? If so, this should be clearly differentiated from a conclusion supported by results and analyses.

      AUTHOR RESPONSE: (3) Based on the reviewers suggestion, we have added a statistical test to compare the phase-locking based of the non-local decoding to hippocampal SWRs and theta phase to shuffled posterior probabilities. Instead of looking at all SWRs in a -2 to 2 second window, we have now only selected the closest SWR in time within that window, and did the statistical comparison in the bin of 0-20 ms from SWR onset. With this new analysis we are looking more directly at the time-locking of the decoded segments to SWR onset (see updated Fig 5a and 6a).

      REVIEWER COMMENT: I appreciate the added analysis focusing on the closest SWR and a 0-20 ms bin. My understanding is that you consider the revised analyses in Figures 5a and 6a sufficient to show that predictive non-local representations in mPFC are not linked to hippocampal SWRs and theta phase.

      First, the manuscript should explicitly explain the rationale for this analysis and why it is sufficient to support the claim. From the main text it is not possible to understand what was done; the Methods are hard to follow, and the figure legends are not clearly described (e.g. the shuffle is not even defined there).

      Specific points I could not reconcile:

      i) The gray histograms in the revised Figures 5a and 6a now show a peak at zero lag, whereas in the previous version they were flat, although they are said to plot the same data. What changed?

      ii) Why choose a 20 ms bin? A single narrow bin invites false negatives. Please justify this choice.

      iii) Comparing to a shuffle is a useful control, but when the p-value is non-significant we only learn that no difference was detected under that shuffle-not that there is no difference or that the processes are independent.

      ORIGINAL COMMENT: (2) The main claim of the work is also based on Figure 3, where the authors show that SWRs-unmodulated mPFC neurons have higher spatial tuning, and higher directional selectivity scores, and a higher percentage of these neurons show theta skipping. This is used to support the claim that SWRs-unmodulated cells encode spatial information. However, it must be noted that in this kind of task, it is not possible to disentangle space and specific task variables involving separate cognitive processes from processing spatial information such as decision-making, attention, motor control, etc., which always happen at specific locations of the maze. Therefore, the results shown in Figure 3 may relate to other specific processes rather than encoding of space and it cannot be unequivocally claimed that mPFC neurons "encode spatial information". This limitation is presented by Mashoori et al (2018), an article that appears to be a major inspiration for this work. Can the authors provide a control analysis/experiment that supports their claim? Otherwise, this claim should be tempered. Also, the authors say that Jadhav et al. (2016) showed that mPFC neurons unmodulated by SWRs are less tuned to space. How do they reconcile it with their results?

      AUTHOR RESPONSE: The reviewer is right to assert caution when talking about claims such as spatial tuning where other factors may also be involved. Although we agree that there may be some other factors influencing what we are seeing as spatial tuning, it is very important to note that the behavioral task is executed on a symmetrical 4-armed maze, where two of the arms are always used for the start of the trajectory, and the other two arms (North and South) function as the goal (reward) arms. Therefore, if the PFC is encoding cognitive processes such as task phases related to decision-making and reward, we would not be able to differentiate between the two start arms and the two goal arms, as these represent the same task phases. Note also that the North and South arm are illuminated in a pseudo-random order between trials and during cue-based rule learning this is a direct indication of where the reward will be found. Even in this phase of the task, the PFC encodes where the animal will turn on a trial-to-trial basis (meaning the North and South arm are still differentiated correctly on each trial even though the illumination and associated reward are changing).

      REVIEWER COMMENT: I appreciate that the departure location was pseudorandomized. However, this control does not rule out that PFC activity reflects motor preparation (left vs right turns) and associated perceptual decision-making/attentional processes that are inherently tied to a specific action. As such, it cannot by itself support the claim that PFC neurons "encode spatial information." Moreover, the authors acknowledge here that "other factors may also be involved," yet this caveat is not reflected in the manuscript. Why?

      AUTHOR RESPONSE: Secondly, importantly, the reviewer mentions that we claimed that Jadhav et al. (2016) showed that mPFC neurons unmodulated by SWRs are less tuned to space, but this is incorrect. Jadhav et al. (2016) showed that SWR-unmodulated neurons had lower spatial coverage, meaning that they are more spatially selective (congruent with our results). We have rephrased this in the text to be clearer.

      REVIEWER COMMENT: Thanks for clarifying this.

      ORIGINAL COMMENT: (3) My reading is that the rest of the paper mainly consists of replications or incremental observations of already known phenomena with some not necessarily surprising new observations:<br /> a) Figure 2 shows that a subset of mPFC neurons is modulated by HPC SWRs and theta (already known), that vmPFC neurons are more strongly modulated by SWRs (not surprising given anatomy), and that theta phase preference is different between vmPFC and dmPFC (not surprising given the fact that theta is a travelling wave).

      AUTHOR RESPONSE: The finding that vmPFC neurons are more strongly modulated by SWRs than dmPFC indeed matches what we know from anatomy, but that does not make it a trivial finding. A lot remains unknown about the mPFC subregions and their interactions with the hippocampus, and not every finding will be directly linked to the anatomy. Therefore, in our view this is a significant finding which has not been studied before due to the technical complexity of large-scale recordings along the dorsal-ventral axis of the mPFC.

      REVIEWER COMMENT: This finding is indeed non-trivial; however, it seems completely irrelevant to the paper's main claim unless the Authors can argue otherwise.

      AUTHOR RESPONSE: Similarly, theta being a traveling wave (which in itself is still under debate), does not mean we should assume that the dorsal and ventral mPFC should follow this signature and be modulated by different phases of the theta cycle. Again, in our view this is not at all trivial, but an important finding which brings us closer to understanding the intricate interactions between the hippocampus and PFC in spatial learning and decision-making.

      REVIEWER COMMENT: Yes, but in what way does this support the manuscript's primary claim? This is unclear to me.

      ORIGINAL COMMENT: b) Figure 4 shows that non-local representations in mPFC are predictive of the animal's choice. This is mostly an increment to the work of Mashoori et al (2018). My understanding is that in addition to what had already been shown by Mashoori et al here it is shown how the upcoming choice can be predicted. The author may want to emphasize this novel aspect.

      AUTHOR RESPONSE: In our view our manuscript focuses on a completely different aspect of learning and memory than the paper the reviewer is referring to (Mashoori et al. 2018). Importantly, the Mashoori et al. paper looked at choice evaluation at reward sites and shows that disappointing reinforcements are associated with reactivations in the ACC of the unselected target. This points to the role of the ACC in error detection and evaluation. Although this is an interesting result, it is in essence unrelated to what we are focusing on here, which is decision making and prediction of upcoming choices. The fact that the turning direction of the animal can be predicted on a trial-to-trial basis, and even precedes the behavioral change over the course of learning, sheds light on the role of the PFC in these important predictive cognitive processes (as opposed to post-choice reflective processes).

      REVIEWER COMMENT: Indeed, as I said, the new element here is that the upcoming choice can be predicted. This appears only incremental and could belong to another story; as the manuscript is currently written, it does not support the article's main claim. I would like to specify that, regarding this and the other points above, my inability to see how these minor results support the Authors' claim may reflect my misunderstanding; nevertheless, this suggests that the manuscript should be extensively rewritten and reorganized to make the Authors' meaning clear.

      ORIGINAL COMMENT: c) Figure 6 shows that prospective activity in the HPC is linked to SWRs and theta oscillations. This has been described in various forms since at least the works of Johnson and Redish in 2007, Pastalkova et al 2008, and Dragoi and Tonegawa (2011 and 2013), as well as in earlier literature on splitter cells. These foundational papers on this topic are not even cited in the current manuscript.

      AUTHOR RESPONSE: We have added these citations to the introduction (line 37).

      REVIEWER COMMENT: This is an example of how the Authors fail to acknowledge the underlying problem with how the manuscript is written; the issue has not been addressed except with a cosmetic change like the one described above. The Results section contains a series of findings that are well-known phenomena described previously (see below). Prior results should be acknowledged at the beginning of each relevant paragraph, followed by an explicit statement of what is new, so that readers can distinguish replication from novelty. Here, I pointed specifically to the results of Figure 6, and the Authors deemed it sufficient simply to add the citations I indicated to an existing sentence in the Introduction, while keeping the Results description unchanged. As written, this reads as if these phenomena are being described for the first time. This is incorrect. It is hard to avoid the impression that the Authors did not take this concern seriously; the same issue appears elsewhere in the manuscript, and I fail to see how the Authors "have improved clarity of the text throughout to highlight the novelty of our results better."

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors used high-density probe recordings in the medial prefrontal cortex (PFC) and hippocampus during a rodent spatial memory task to examine functional sub-populations of PFC neurons that are modulated vs. unmodulated by hippocampal sharp-wave ripples (SWRs), an important physiological biomarker that is thought to have a role in mediating information transfer across hippocampal-cortical networks for memory processes. SWRs are associated with the reactivation of representations of previous experiences, and associated reactivation in hippocampal and cortical regions has been proposed to have a role in memory formation, retrieval, planning, and memory-guided behavior. This study focuses on awake SWRs that are prevalent during immobility periods during pauses in behavior. Previous studies have reported strong modulation of a subset of prefrontal neurons during hippocampal SWRs, with some studies reporting prefrontal reactivation during SWRs that have a role in spatial memory processes. The study seeks to extend these findings by examining the activity of SWR-modulated vs. unmodulated neurons across PFC sub-regions, and whether there is a functional distinction between these two kinds of neuronal populations with respect to representing spatial information and supporting memory-guided decision-making.

      Strengths:

      The major strength of the study is the use of Neuropixels 1.0 probes to monitor activity throughout the dorsal-ventral extent of the rodent medial prefrontal cortex, permitting an investigation of functional distinction in neuronal populations across PFC sub-regions. They are able to show that SWR-unmodulated neurons, in addition to having stronger spatial tuning than SWR-modulated neurons as previously reported, also show stronger directional selectivity and theta-cycle skipping properties.

      Weaknesses:

      (1) While the study is able to extend previous findings that SWR-modulated PFC neurons have significantly lower spatial tuning that SWR-unmodulated neurons, the evidence presented does not support the main conclusion of the paper that only the unmodulated neurons are involved in spatial tuning and signaling upcoming choice, implying that SWR-modulated neurons are not involved in predicting upcoming choice, as stated in the abstract. This conclusion makes a categorical distinction between two neuronal populations, that SWR-modulated neurons are involved and SWR-unmodulated are not involved in predicting upcoming choice, which requires evidence that clearly shows this absolute distinction. However, in the analyses showing non-local population decoding in PFC for predicting upcoming choice, the results show that SWR-unmodulated neurons have higher firing rates than SWR-modulated neurons, which is not a categorical distinction. Higher firing rates do not imply that only SWR-unmodulated neurons are contributing to the non-local decoding. They may contribute more than SWR-modulated neurons, but there are no follow-up analyses to assess the contribution of the two sub-populations to non-local decoding.

      We agree with the reviewer that this is indeed not a categorical distinction, and do not wish to claim that the SWR-modulated neurons have absolutely no role in non-local decoding and signaling upcoming choice. We have adjusted this in the title, abstract and text to clarify this for the reader. Furthermore, we have performed additional analyses to elucidate the role of SWR-modulated neurons in non-local decoding by creating separate decoding models for SWR-modulated and unmodulated PFC neurons respectively. These analyses show that the SWR-unmodulated neurons are indeed encoding representations of the upcoming choice more often than the alternative choice, whereas the SWR-modulated neurons do not reliably differentiate the upcoming and alternative choices in non-local decoding at the choice point (see new Fig 4d).

      (2) Further, the results show that during non-local representations of the hippocampus of the upcoming options, SWR-excited PFC neurons were more active during hippocampal representations of the upcoming choice, and SWR-inhibited PFC neurons were less active during hippocampal representations of the alternative choice. This clearly suggests that SWR-modulated neurons are involved in signaling upcoming choice, at least during hippocampal non-local representations, which contradicts the main conclusion of the paper.

      This does not contradict the main conclusion of the paper, but in fact strengthens the hypothesis we are putting forward: that the SWR-modulated neurons are more linked to the hippocampal non-local representations, whereas the SWR-unmodulated neurons seem to have their own encoding of upcoming choice which is not linked to the signatures in the hippocampus (almost no time overlap with hippocampal representations, no phase locking to hippocampal theta, no time locking to hippocampal SWRs, no increased firing during hippocampal representations of upcoming choice).

      (3) Similarly, one of the analyses shows that PFC nonlocal representations show no preference for hippocampal SWRs or hippocampal theta phase. However, the examples shown for non-local representations clearly show that these decodes occur prior to the start of the trajectory, or during running on the central zone or start arm. The time period of immobility prior to the start arm running will have a higher prevalence of SWRs and that during running will have a higher prevalence of theta oscillations and theta sequences, so non-local decoded representations have to sub-divided according to these known local-field potential phenomena for this analysis, which is not followed.

      These analyses are in fact separated based on proximity to SWRs (only segments that occurred within 2 seconds of SWR onset were included, see Methods) and theta periods respectively (selected based on a running speed of more than 5 cm/s and the absence of SWRs in the hippocampus, see Methods). We have clarified this in the main text.

      (4) The primary phenomenon that the manuscript relies on is the modulation of PFC neurons by hippocampal SWRs, so it is necessary to perform the PFC population decoding analyses during SWRs (or examine non-local decoding that occurs specifically during SWRs), as reported in previous studies of PFC reactivation during SWRs, to see if there is any distinction between modulated and unmodulated neurons in this reactivation. Even in the case of independent PFC reactivation as reported by one study, this PFC reactivation was still reported to occur during hippocampal SWRs, therefore decoding during SWRs has to be examined. Similarly, the phenomenon of theta cycle skipping is related to theta sequence representations, so decoding during PFC and hippocampal theta sequences has to be examined before coming to any conclusions.

      The histograms shown in Figure 5a (see updated Fig 5a where we look at the closest SWR in time and compare the occurrence with shuffled data) show that there is no increased prevalence of decoding upcoming and alternative choices in the PFC during hippocampal SWRs. The lack of overlap of non-local decoding between the hippocampus and PFC further shows that these non-local representations occur at different timepoints in the PFC and hippocampus (see updated Fig 4e where we added a statistical test to show the likeliness of the overlap between the decoded segments in the PFC and hippocampus). Based on the reviewer's suggestion, we have additionally decoded the information in the PFC during hippocampal SWRs exclusively, and found that the direction on the maze could not be predicted based on the decoding of SWR time points in the PFC. See figure below. Similarly, we can see from the histograms in Figure 5c that there is no phase locking to the hippocampal theta phase for non-local representations in the PFC, and in contrast there is phase locking of the hippocampal encoding of upcoming choice to the rising phase of the theta cycle (Fig 6c), further highlighting the separation between these two regions in the non-local decoding.

      Reviewer #2 (Public review):

      Summary:

      This work by den Bakker and Kloosterman contributes to the vast body of research exploring the dynamics governing the communication between the hippocampus (HPC) and the medial prefrontal cortex (mPFC) during spatial learning and navigation. Previous research showed that population activity of mPFC neurons is replayed during HPC sharp-wave ripple events (SWRs), which may therefore correspond to privileged windows for the transfer of learned navigation information from the HPC, where initial learning occurs, to the mPFC, which is thought to store this information long term. Indeed, it was also previously shown that the activity of mPFC neurons contains task-related information that can inform about the location of an animal in a maze, which can predict the animals' navigational choices. Here, the authors aim to show that the mPFC neurons that are modulated by HPC activity (SWRs and theta rhythms) are distinct from those "encoding" spatial information. This result could suggest that the integration of spatial information originating from the HPC within the mPFC may require the cooperation of separate sets of neurons.

      This observation may be useful to further extend our understanding of the dynamics regulating the exchange of information between the HPC and mPFC during learning. However, my understanding is that this finding is mainly based upon a negative result, which cannot be statistically proven by the failure to reject the null hypothesis. Moreover, in my reading, the rest of the paper mainly replicates phenomena that have already been described, with the original reports not correctly cited. My opinion is that the novel elements should be precisely identified and discussed, while the current phrasing in the manuscript, in most cases, leads readers to think that these results are new. Detailed comments are provided below.

      Major concerns:

      (1) The main claim of the manuscript is that the neurons involved in predicting upcoming choices are not the neurons modulated by the HPC. This is based upon the evidence provided in Figure 5, which is a negative result that the authors employ to claim that predictive non-local representations in the mPFC are not linked to hippocampal SWRs and theta phase. However, it is important to remember that in a statistical test, the failure to reject the null hypothesis does not prove that the null hypothesis is true. Since this claim is so central in this work, the authors should use appropriate statistics to demonstrate that the null hypothesis is true. This can be accomplished by showing that there is no effect above some size that is so small that it would make the effect meaningless (see https://doi.org/10.1177/070674370304801108).

      We would like to highlight a few important points here. (1) We indeed do not intend to claim that the SWR-modulated neurons are not at all involved in predicting upcoming choice, just that the SWR-unmodulated neurons may play a larger role. We have rephrased the title and abstract to make this clearer. (2) The hypothesis that we put forward is based not only on a negative effect, but on the findings that: the SWR-unmodulated neurons show higher spatial tuning (Fig 3b), more directional selectivity (Fig 3d), more frequent encoding of the upcoming choice at the choice point (new analysis, added in Fig 4d), and higher spike rates during the representations of the upcoming choice (Fig 5b). This is further highlighted by the fact that the representations of upcoming choice in the PFC are not time locked to SWRs (whereas the hippocampal representations of upcoming choice are;  see Fig 5a and Fig 6a), and not time-locked to hippocampal theta phase (whereas the hippocampal representations are; see Fig 5c and Fig 6c). Finally, the representations of upcoming and alternative choices in the PFC do not show a large overlap in time with the representations in the hippocampus (see updated Fig 4e were we added a statistical test to show the likelihood of the overlap of decoded timepoints). All these results together lead us to hypothesize that SWR-modulation is not the driving factor behind non-local decoding in the PFC. (3) Based on the reviewers suggestion, we have added a statistical test to compare the phase-locking based of the non-local decoding to hippocampal SWRs and theta phase to shuffled posterior probabilities. Instead of looking at all SWRs in a -2 to 2 second window, we have now only selected the closest SWR in time within that window, and did the statistical comparison in the bin of 0-20 ms from SWR onset. With this new analysis we are looking more directly at the time-locking of the decoded segments to SWR onset (see updated Fig 5a and 6a).

      (2) The main claim of the work is also based on Figure 3, where the authors show that SWRs-unmodulated mPFC neurons have higher spatial tuning, and higher directional selectivity scores, and a higher percentage of these neurons show theta skipping. This is used to support the claim that SWRs-unmodulated cells encode spatial information. However, it must be noted that in this kind of task, it is not possible to disentangle space and specific task variables involving separate cognitive processes from processing spatial information such as decision-making, attention, motor control, etc., which always happen at specific locations of the maze. Therefore, the results shown in Figure 3 may relate to other specific processes rather than encoding of space and it cannot be unequivocally claimed that mPFC neurons "encode spatial information". This limitation is presented by Mashoori et al (2018), an article that appears to be a major inspiration for this work. Can the authors provide a control analysis/experiment that supports their claim? Otherwise, this claim should be tempered. Also, the authors say that Jadhav et al. (2016) showed that mPFC neurons unmodulated by SWRs are less tuned to space. How do they reconcile it with their results?

      The reviewer is right to assert caution when talking about claims such as spatial tuning where other factors may also be involved. Although we agree that there may be some other factors influencing what we are seeing as spatial tuning, it is very important to note that the behavioral task is executed on a symmetrical 4-armed maze, where two of the arms are always used for the start of the trajectory, and the other two arms (North and South) function as the goal (reward) arms. Therefore, if the PFC is encoding cognitive processes such as task phases related to decision-making and reward, we would not be able to differentiate between the two start arms and the two goal arms, as these represent the same task phases. Note also that the North and South arm are illuminated in a pseudo-random order between trials and during cue-based rule learning this is a direct indication of where the reward will be found. Even in this phase of the task, the PFC encodes where the animal will turn on a trial-to-trial basis (meaning the North and South arm are still differentiated correctly on each trial even though the illumination and associated reward are changing).

      Secondly, importantly, the reviewer mentions that we claimed that Jadhav et al. (2016) showed that mPFC neurons unmodulated by SWRs are less tuned to space, but this is incorrect. Jadhav et al. (2016) showed that SWR-unmodulated neurons had lower spatial coverage, meaning that they are more spatially selective (congruent with our results). We have rephrased this in the text to be clearer.

      (3) My reading is that the rest of the paper mainly consists of replications or incremental observations of already known phenomena with some not necessarily surprising new observations:

      (a) Figure 2 shows that a subset of mPFC neurons is modulated by HPC SWRs and theta (already known), that vmPFC neurons are more strongly modulated by SWRs (not surprising given anatomy), and that theta phase preference is different between vmPFC and dmPFC (not surprising given the fact that theta is a travelling wave).

      The finding that vmPFC neurons are more strongly modulated by SWRs than dmPFC indeed matches what we know from anatomy, but that does not make it a trivial finding. A lot remains unknown about the mPFC subregions and their interactions with the hippocampus, and not every finding will be directly linked to the anatomy. Therefore, in our view this is a significant finding which has not been studied before due to the technical complexity of large-scale recordings along the dorsal-ventral axis of the mPFC.

      Similarly, theta being a traveling wave (which in itself is still under debate), does not mean we should assume that the dorsal and ventral mPFC should follow this signature and be modulated by different phases of the theta cycle. Again, in our view this is not at all trivial, but an important finding which brings us closer to understanding the intricate interactions between the hippocampus and PFC in spatial learning and decision-making.

      (b) Figure 4 shows that non-local representations in mPFC are predictive of the animal's choice. This is mostly an increment to the work of Mashoori et al (2018). My understanding is that in addition to what had already been shown by Mashoori et al here it is shown how the upcoming choice can be predicted. The author may want to emphasize this novel aspect.

      In our view our manuscript focuses on a completely different aspect of learning and memory than the paper the reviewer is referring to (Mashoori et al. 2018). Importantly, the Mashoori et al. paper looked at choice evaluation at reward sites and shows that disappointing reinforcements are associated with reactivations in the ACC of the unselected target. This points to the role of the ACC in error detection and evaluation. Although this is an interesting result, it is in essence unrelated to what we are focusing on here, which is decision making and prediction of upcoming choices. The fact that the turning direction of the animal can be predicted on a trial-to-trial basis, and even precedes the behavioral change over the course of learning, sheds light on the role of the PFC in these important predictive cognitive processes (as opposed to post-choice reflective processes).

      (c) Figure 6 shows that prospective activity in the HPC is linked to SWRs and theta oscillations. This has been described in various forms since at least the works of Johnson and Redish in 2007, Pastalkova et al 2008, and Dragoi and Tonegawa (2011 and 2013), as well as in earlier literature on splitter cells. These foundational papers on this topic are not even cited in the current manuscript.

      We have added these citations to the introduction (line 37).

      Although some previous work is cited, the current narrative of the results section may lead the reader to think that these results are new, which I think is unfair. Previous evidence of the same phenomena should be cited all along the results and what is new and/or different from previous results should be clearly stated and discussed. Pure replications of previous works may actually just be supplementary figures. It is not fair that the titles of paragraphs and main figures correspond to notions that are well established in the literature (e.g., Figure 2, 2nd paragraph of results, etc.).

      We have changed the title of paragraph 2 and Figure 2 to highlight more clearly the novel result (the difference between the dorsal and ventral mPFC), and have improved clarity of the text throughout to highlight the novelty of our results better.

      (d) My opinion is that, overall, the paper gives the impression of being somewhat rushed and lacking attention to detail. Many figure panels are difficult to understand due to incomplete legends and visualizations with tiny, indistinguishable details. Moreover, some previous works are not correctly cited. I tried to make a list of everything I spotted below.

      We have addressed all the comments in the Recommendations for Authors.

      Reviewer #1 (Recommendations for the authors):

      (1) Expanding on the points above, one of the strengths of the study is expanding the previous result that SWR-unmodulated neurons are more spatially selective (Jadhav et al., 2016), across prefrontal sub-regions, and showing that these neurons are more directionally selective and show more theta cycle skipping. Theta cycle skipping is related to theta sequence representations and previous studies have established PFC theta sequences in parallel to hippocampal theta sequences (Tang et al., 2021; Hasz and Redish, 2020; Wang et al., 2024), and the theta cycle skipping result suggests that SWR-unmodulated neurons should show stronger participation than SWR-modulated neurons in PFC theta sequences that decode to upcoming or alternative location, which can be tested in this high-density PFC physiology data. This is still unlikely to make a categorical distinction that only SWR-unmodulated neurons participate in theta sequence decoding, but will be useful to examine.

      We thank the reviewer for their suggestion and have now included results based on separate decoding models that only use SWR-modulated or SWR-unmodulated mPFC neurons. From this analysis we see that indeed SWR-unmodulated neurons are not the only group contributing to theta sequence decoding, but they do distinguish more strongly between the upcoming and alternative arms at the choice point (see new Fig 4d).

      (2) Non-local decoding in 50ms windows on a theta timescale is a valid analysis, but ignoring potential variability in the internal state during running vs. immobility, and as indicated by LFPs by the presence of SWRs or theta oscillations, is incorrect especially when conclusions are being made about decoding during SWRs and theta oscillation phase, and in light of previous evidence that these are distinct states during behavior. There are multiple papers on PFC theta sequences (Tang et al., 2021; Hasz and Redish, 2020; Wang et al., 2024), and on PFC reactivation during SWRs (Shin et al., 2019; Kaefer et al., 2020; Jarovi et al., 2023), and this dataset of high-density prefrontal recordings using Neuropixels 1.0 provides an opportunity to investigate these phenomena in detail. Here, it should be noted that although Kaefer et al. reported independent prefrontal reactivation from hippocampal reactivation, these PFC reactivation events still occurred during hippocampal SWRs in their data, and were linked to memory performance.

      From our data we see that the time segments that represent upcoming or alternative choice in the prefrontal cortex are in fact not time-locked to hippocampal SWRs (updated Fig 5a where we look only at the closest SWR in time and compare this to shuffled data). In addition, these segments do not overlap much with the decoded segments in the hippocampus (see updated Fig 4e where we added a shuffling procedure to assess the likelihood of the overlap with hippocampal decoded segments). Importantly, we are not ignoring the variability during running and immobility, as theta segments were selected based on a running speed of more than 5 cm/s and the absence of SWRs in the hippocampus (see Methods), ensuring that the theta and SWR analyses were done on the two different behavioral states respectively. We have  clarified this in the main text.

      (3) The majority of rodent studies make the distinction between ACC, PrL, and IL, although as the authors noted, there are arguments that rodent mPFC is a continuum (Howland et al., 2022), or even that rodent mPFC is a unitary cingulate cortical region (van Heukelum et al., 2020). The authors choose to present the results as dorsal (ACC + dorsal PrL) vs. ventral mPFC (ventral PrL + IL), however, in my opinion, it will be more useful to the field to see results separately for ACC, PrL, and IL, given the vast literature on connectivity and functional differences in these regions.

      We appreciate the reviewer’s suggestion. Initially, we did perform all analyses separately for the ACC, PLC and ILC subregions. However, we observed that the differences between subregions (strength of SWR-modulation and the phase locking to theta) varied uniformly along the dorsal-ventral axis, i.e., the PLC showed a profile of SWR-modulation and theta phase locking that fell in between that of the ACC and the ILC. This is also highlighted in paragraph 3 of the introduction (lines 52-56). For that reason, and for the sake of reducing the number of variables, increasing statistical power, and improving readability, we focused on the dorsal-ventral distinction instead, as this is where the main differences were seen.

      (4) I suggest that the authors refrain from making categorical distinctions as in their title and abstract, such as "neurons that are involved in predicting upcoming choice are not the neurons that are modulated by hippocampal sharp-wave ripples" when the evidence presented can only support gradation of participation of the two neuronal sub-populations, not an absolute distinction. The division of SWR-modulated and SWR-unmodulated neurons itself is determined by the statistic chosen to divide the neurons into one or two sub-classes and will vary with the statistical threshold employed. Further, previous studies have suggested that SWR-excited and SWR-inhibited neurons comprise distinct functional sub-populations based on their activity properties (Jadhav et al., 2016; Tang et al., 2017), but it is not clear to what degree is SWR-modulated neurons a distinct and singular functional sub-population. In the absence of connectivity information and cross-correlation measures within and across sub-populations, it is prudent to be conservative about this interpretation of SWR-unmodulated neurons.

      We agree with the reviewer that the distinction is not categorical and have changed the wording in the title and abstract. We also do not intend to claim that the SWR-modulated neurons are a distinct and singular functional sub-population, and for that reason the firing rates from the SWR-excited and SWR-inhibited groups are reported separately throughout the paper.

      Reviewer #2 (Recommendations for the authors):

      Minor detailed remarks:

      (1) The authors should provide a statistical test, perhaps against shuffled data, for Figures 5a,c and 6a,c.

      We thank the reviewer for their suggestion and have added statistical tests in Figures 5a, 5c, 6a and 6c.

      (2) The behavioral task is explained only in the legend of Figure 1c, and the explanation is quite vague. In this type of article format, readers need to have a clear understanding of the task without having to refer to the methods section. A clear understanding of the task is crucial for interpreting all subsequent analyses. In my opinion, the word 'trial' in the figure is misleading, as these are sessions composed of many trials.

      We have added a more thorough description of the behavioral task, both in the main text and the Figure legend.

      (3) Figure 1d, legend of markers missing.

      We have added a legend for the markers.

      (4) When there are multiple bars and a single p-value is presented, it is unclear which group comparisons the p-value pertains to. For instance, Figures 2c-f and 3b, d, f (right parts), and 5b...

      For all p-values we have added lines to the figures that indicate the groups that were compared and have added descriptions of the statistical test to the figure legends to indicate what each p-value represents.

      (5) In Figure 3c, the legend does not explain what the colored lines represent, and the lines themselves are very small and almost indistinguishable.

      We have changed the colored lines to quadrants on the maze to clarify what each direction represents.

      (6) Figure 4a is too small, and the elements are so tiny that it is impossible to distinguish them and their respective colors. The term 'segment' has not been unequivocally explained in the text. All the different elements of the panel should be explicitly explained in the legend to make it easily understandable. What do the pictograms of the maze on the left represent? What does the dashed vertical line indicate?

      We have added the definition of a segment in the text (lines 283-286) and have improved the clarity and readability of Figure 4a.

      (7) In Figure 5, what do the red dots on the right part relate to? The legend should explicitly explain what is shown in the left and right parts, respectively. What comparisons do the p-values relate to?

      We have adjusted the legend to explain the left and right parts of the figure and we have added the statistical test that was used to get to the p-value (in addition to the text which already explained this).

      (8) Panels b of Figures 5 and 6 should have the same y-axis scale for comparison. The position of the p-values should also be consistent. With the current arrangement in Figure 6, it is unclear what the p-values relate to.

      We have adjusted the y-scale to be the same for Figures 5 and 6, and we have added a description of the statistical test to the legend.

      (9) Multiple studies have previously shown that mPFC activity contains spatial information (e.g., refs 24-27). It is important that, throughout the paper, the authors frame their results in relation to previous findings, highlighting what is novel in this work.

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have indicated more clearly which results replicate previous findings and highlighted novel results.

      (10) Please note that Peyrache et al. (2009) do not show trajectory replay, nor do they decode location. I am not familiar with all the cited literature, but this makes me think that the authors may want to double-check their citations to ensure they assign the correct claims to each past work.

      We have adjusted the reference to the work to exclude the word ‘trajectory’ and doublechecked our other citations.

      (11) The authors perform theta-skipping analysis, first described by Kay et al., but do not cite the original paper until the discussion.

      Thank you pointing out this oversight. We have now included this citation earlier in the paper (line 231).

      (12) Additionally, some parts of the text are difficult to grasp, and there are English vocabulary and syntax errors. I am happy to provide comments on the next version of the text, but please include page and line numbers in the PDF. The authors may also consider using AI to correct English mistakes and improve the fluency and readability of their text.

      We have carefully gone through the text to correct any errors.  We have now also included page and line numbers and we will be happy to address any specific issues the reviewer may spot in the revised manuscript.

    1. eLife Assessment

      This study provides valuable insights into the mechanisms of remote memory impairment in an Alzheimer's disease mouse model. The evidence is compelling, with careful use of viral-TRAP labeling and patch-clamp electrophysiology to demonstrate altered inhibitory microcircuit function, though the mechanistic link to memory deficits remains correlative. Overall, the work advances understanding of early circuit-level changes in AD, while highlighting open questions regarding causality and broader network contributions.

    2. Reviewer #2 (Public review):

      This study presents a thorough investigation of remote memory deficits in the APP/PS1 mouse model of Alzheimer's disease, highlighting the progressive emergence of these deficits alongside selective hyperexcitability of PV interneurons in the mPFC. By combining viral-TRAP labeling and patch-clamp electrophysiology, the authors demonstrate increased inhibitory input onto engram cells in APP/PS1 mice, despite preserved engram size and reactivation. The revised manuscript successfully addresses earlier concerns by clarifying the relationship between amyloid pathology and circuit dysfunction, acknowledging the correlative nature of the findings, and integrating possible contributions of excitatory remodeling and broader network changes, including oscillatory disruptions. Although the precise mechanistic link between PV hyperexcitability, increased inhibition, and impaired remote memory remains to be empirically established, the study convincingly argues for inhibitory microcircuit alterations as an early contributor to cognitive decline in AD.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      This study presents evidence that remote memory in the APP/PS1 mouse model of Alzheimer's disease (AD) is associated with PV interneuron hyperexcitability and increased inhibition of cortical engram cells. Its strength lies in the fact that it explores a neglected aspect of memory research - remote memory impairments related to AD (for which the primary research focus is usually on recent memory impairments) -which has received minimal attention to date. While the findings are intriguing, the weakness of the paper hovers around purely correlational types of evidence and superficial data analyses, which require substantial revisions as outlined below. 

      We thank the reviewer for their feedback, and we appreciate the recognition of the study’s novelty in addressing remote memory impairments in AD. We acknowledge the reviewer’s concerns and have implemented revisions to strengthen the manuscript.

      Major concerns: 

      (1) In light of previous work, including that by the authors themselves, the data in Figure 1 should be implemented by measurements of recent memory recall in order to assess whether remote memories are exclusively impaired or whether remote memory recall merely represents a continuation of recent memory impairments.

      We agree with the reviewer that is an important point. In line with their suggestion in minor comment 1, we now omitted the statement on recent memory in the results (previously on lines 109-111 and 117). Nonetheless, previous independent experiments from our group have repeatedly shown recent memory deficits in APP/PS1 mice at 12 weeks of age, including a recent article published in 2023. We refer the reviewer to figure 2c in Végh et al. (2014) and figure 2i in Kater et al. (2023). We have added a reference of the latter paper to our discussion section (line 458-459). Therefore, we are confident that the recent memory deficit at 12 weeks of age is a stable phenotype in our APP/PS1 mice.

      With these data in mind, we argue that the remote memory recall impairment is not a continuation of recent memory impairments. Recent memory deficits emerge already at 12 weeks of age, and when remote memory is assessed at 16 weeks (4 weeks after training at 12 weeks of age), APP/PS1 mice are still capable of forming and retrieving a remote memory. This suggests that remote memory retrieval can occur even when recent memory is compromised, arguing against the idea that the remote memory deficit observed at 20 weeks is a continuation of earlier recent memory impairments. We have clarified this point in the revised manuscript by adding the following sentence to the discussion section (line 462-465): 

      ‘This suggests that a remote memory can be formed even when recent memory expression is already compromised, indicating that the remote memory deficit in 20-week-old APP/PS1 mice is not a continuation of earlier recent memory impairments.’

      (2) Figure 2 shows electrophysiological properties of PV cells in the mPFC that correlate with the behavior shown in Figure 1. However, the mice used in Figure 2 are different than the mice used in Figure 1. Thus, the data are correlative at best, and the authors need to confirm that behavioral impairments in the APP/PS1 mice crossed to PV-Cre (and SST-Cre mice) used in Figure 2 are similar to those of the APP/PS1 mice used in Figure 1. Without that, no conclusions between behavioral impairments and electrophysiological as well as engram reactivation properties can be made, and the central claims of the paper cannot be upheld. 

      We thank the reviewer for raising this concern. Indeed, the remote memory impairment and PV hyperexcitability are correlative data, and therefore we do not make causal claims based on these data. However, please note that most of our key findings, including behavioural impairments, characterization of the engram ensemble and reactivation thereof, as well as inhibitory input measurements, were acquired using the same mouse line (APP/PS1), strengthening the coherence of our conclusions. Also, our electrophysiological findings in APP/PS1 (enhanced sIPSC frequency) and APP/PS1-PV-Cre-tdTomato (enhanced PV cell excitability) mice align well. Direct comparisons between the transgenic mouse lines APP/PS1 and APP/PS1 Parv-Cre were performed in our previous studies, confirming that these lines are similar in terms of behaviour and pathology. Specifically, we demonstrated that APP/PS1 mice display spatial memory impairments at 16 weeks of age, Fig 4a-d, consistent with the deficits observed in APP/PS1 Parv-Cre mice at 16 weeks of age, Fig 5a-c (Hijazi et al., 2020a). Additionally, Hijazi et al. (2020a) showed that soluble and insoluble Aβ levels do not differ between APP/PS1 Parv-Cre and APP/PS1 mice (sFig. 1), indicating comparable levels of pathology between these lines. While we do not have a similar characterization of the APP/PS1 SST-Cre line, we should mention that we also did not observe excitability differences in SST cells. We now acknowledge the limitation in the revised discussion section (line 480-487), and stress that our electrophysiology and behavioural findings are correlative in nature:

      ‘Although the excitability measurements were performed in APP/PS1-PV-Cre-tdTomato mice, and not in the APP/PS1 parental line, we previously found that these transgenic mouse lines exhibit comparable amyloid pathology (both soluble and insoluble amyloid beta levels) as well as similar spatial memory deficits (Hijazi et al., 2020a; Kater et al., 2023). Thus, our observations indicate that the APP/PS1 PV-Cre-tdTomato and APP/PS1 lines are similar in terms of pathology and behaviour. Nonetheless, further work is needed to identify a causal link between PV cell hyperexcitability and remote memory impairment.’ 

      (3) The reactivation data starting in Figure 3 should be analysed in much more depth: 

      a) The authors restrict their analysis to intra-animal comparisons, but additional ones should be performed, such as inter-animal (WT vs APP/PS1) as well as inter-age (12-16w vs 16-20w). In doing so, reactivation data should be normalized to chance levels per animal, to account for differences in labelling efficiency - this is standard in the field (see original Tonegawa papers and for a reference). This could highlight differences in total reactivation that are already apparent, such as for instance in WT vs APP/PS1 at 20w (Figure 3o) and highlight a decrease in reactivation in AD mice at this age, contrary to what is stated in lines 213-214. 

      We would like to thank the reviewer for the valuable input on the reactivation data in Figure 3. 

      We agree with the reviewer and now depict the data as normalized to chance levels (Figure 3). The original figures are now supplemental (sFig. 5). The reactivation data normalized to chance are similar to the original results, i.e. no difference was observed in the reactivation of the mPFC engram ensemble between genotypes. The reviewer may have overlooked that we did perform inter-animal (WT vs. APP/PS1) comparisons, however these were not significantly different. We have made this clearer in the main text, lines 277, 288-289, 294-295 and 303-304. Moreover, the reviewer recommended including inter-age group comparisons, which have now been added to the supplemental figures (sFig. 6). No genotype-dependent differences were observed. While a main effect of age group did emerge, indicating that there is a potential increased overlap between Fos+ and mCherry+ in animals aged 16-20 weeks, we caution against overinterpreting this finding. These experimental groups were processed in separate cohorts, with viral injection and 4TM-induced tagging performed at different moments in time, which may have contributed to the observed differences in overlap. We have addressed this point in the revised discussion (line 612-617):

      ‘Furthermore, we also observed an increase in the amount overlap between Fos+ and mCherry+ engram cells when comparing the 12-16w and 16-20w age groups. This finding should be interpreted with caution, as the experimental groups were processed in separate cohorts, with viral injections and 4TM-induced tagging performed at different moments in time. This may have contributed to the observed differences between ages.’

      b) Comparing the proportion of mcherry+ cells in PV- and PV+ is problematic, considering that the PV- population is not "pure" like the PV+, but rather likely to represent a mix of different pyramidal neurons (probably from several layers), other inhibitory neurons like SST and maybe even glial cells. Considering this, the statement on line 218 is misleading in saying that PVs are overrepresented. If anything, the same populations should be compared across ages or groups.  

      We thank the reviewer for their insightful comment and agree that the PV- population of cells is likely more heterogenous than the PV+ population. However, we would like to clarify that all quantified cells were selected based on Nissl immunoreactivity, and to exclude non-neuronal cells, stringent thresholding was applied in the script that was used to identify Nissl+ cells. The threshold information has now been added to the methods section (line 758-760). Thus, although heterogenous, the analysed PV- population reflects a neuronal subset. In response to the reviewer’s suggestion, we have now included overlap measurements relative to chance levels (Figure 3). These analyses did not reveal differences with the original analyses, i.e., there are no genotype specific differences. We have also incorporated the suggested inter-age group comparisons (sFig. 6) and found no differences between age groups. In light of the raised concerns, we have removed the statement that PV cells were overrepresented in the engram ensemble.

      c) A similar concern applies to the mcherry- population in Figure 4, which could represent different types of neurons that were never active, compared to the relatively homogeneous engram mcherry+ population. This could be elegantly fixed by restricting the comparison to mCherry+Fos+ vs mCherry+Fos- ensembles and could indicate engram reactivation-specific differences in perisomatic inhibition by PV cells. 

      The comparison the reviewer suggests, comparing mCherry+Fos+ to mCherry+Fos- is indeed conceptually interesting and could provide more insight into engram reactivation and PV input. However, there are practical limitations to performing this analysis, as neurons in close proximity need to be compared in a pairwise manner to account for local variability in staining intensity. As shown in Figure 3c+k and Figure 4a+b, d+e, PV immunostaining intensity varies to a certain extend within a given image. While pairwise comparisons of neighbouring neurons were feasible when analysing mCherry+ and mCherry- cells, they are unfortunately not feasible for the mCherry+Fos+ vs. mCherry+Fos- comparison. The occurrence of spatially adjacent mCherry+Fos+ and mCherry+Fos- neurons is too sparse for a pairwise comparison. This analysis would therefore result in substantial under-sampling and limit the reliability of the analysis. Nonetheless, we agree with the reviewer that the mCherry- population may be more heterogenous than the mCherry+ population, despite the fact that PV+ neurons and that non-neuronal cells were excluded from both populations in the analyses. We therefore added a statement to the discussion to acknowledge this limitation (line 536-539): 

      ‘Although PV+ cells were not included in this analysis and we excluded non-neuronal cells based on the area of the Nissl stain, the mCherry- population was potentially more heterogenous than the mCherry+ population, which may have contributed to the differences we observed.’

      (4) At several instances, there are some doubts about the statistical measures having been employed: 

      a) In Figure 4f, it is unclear why a repeated measurement ANOVA was used as opposed to a regular ANOVA. 

      b) In Supplementary Figure 2b, a Mann-Whitney test was used, supposedly because the data were not normally distributed. However, when looking at the individual data points, the data does seem to be normally distributed. Thus, the authors need to provide the test details as to how they measured the normalcy of distribution. 

      a) Based on the pairwise comparison of neighbouring neurons within animals, the data in Figure 4f was analysed with a repeated measure ANOVA. 

      b) We thank the author for their comment on Supplementary Figure 2b. The data is indeed normally distributed, and we have analysed it using a D’Agostino & Pearson test. We have corrected this in the supplemental figure. 

      Minor concerns: 

      (1) Line 117: The authors cite a recent memory impairment here, as shown by another paper. However, given the notorious difficulty in replicating behavioral findings, in particular in APP/PS1 mice (number of backcrossings, housing conditions, etc., might differ between laboratories), such a statement cannot be made. The authors should either show in their own hands that recent memory is indeed affected at 12 weeks of age, or they should omit this statement. 

      We thank the reviewer for this thoughtful comment. As noted in our response to major concern (1), we have addressed this concern by providing additional information and clarification in the discussion (line 462-465) regarding the possibility that remote memory impairments are a continuation of recent memory impairments. As mentioned in our response, we have added a reference to a more recent study from our lab (Kater et al. (2023). These findings are consistent with the earlier report from our lab (Végh et al. (2014), underscoring the reproducibility of this phenotype across independent cohorts and time. Notably, the experiments in the 2023 and present study were performed using the same housing and experimental conditions. Nevertheless, in light of the reviewer’s suggestion, and to avoid overstatement or speculation, we have now omitted the sentence referring to recent memory impairments at 12 weeks of age from the results section.

      (2) Pertaining to Figure 3, low-resolution images of the mPFC should be provided to assess the spread of injection and the overall degree of double-positive cells.  

      We agree with the reviewer and have added images of the mPFC as a supplemental figure (sFig. 3) that show the spread of the injection. Unfortunately, it is not possible to visualize the overall degree of double-positive cells at a lower magnification (or low-resolution). Representative examples of colocalization are presented in Figure 3.

      Reviewer #2 (Public review): 

      This study presents a comprehensive investigation of remote memory deficits in the APP/PS1 mouse model of Alzheimer's disease. The authors convincingly show that these deficits emerge progressively and are paralleled by selective hyperexcitability of PV interneurons in the mPFC. Using viral-TRAP labeling and patch-clamp electrophysiology, they demonstrate that inhibitory input onto labeled engram cells is selectively increased in APP/PS1 mice, despite unaltered engram size or reactivation. These findings support the idea that alterations in inhibitory microcircuits may contribute to cognitive decline in AD. 

      However, several aspects of the study merit further clarification. Most critically, the central paradox, i.e., increased inhibitory input without an apparent change in engram reactivation, remains unresolved. The authors propose possible mechanisms involving altered synchrony or impaired output of engram cells, but these hypotheses require further empirical support. Additionally, the study employs multiple crossed transgenic lines without reporting the progression of amyloid pathology in the mPFC, which is important for interpreting the relationship between circuit dysfunction and disease stage. Finally, the potential contribution of broader network dysfunction, such as spontaneous epileptiform activity reported in APP/PS1 mice, is also not addressed. 

      We thank the reviewer for their evaluation and appreciate the positive assessment of our study’s contributing to understanding remote memory deficits and the dysfunction of inhibitory microcircuits in AD. We also acknowledge the relevant points raised and have revised the manuscript to clarify our interpretations. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Line 68: What are "APP23xPS45" mice? This is most likely a typo.

      This line is a previously reported double transgenic amyloid beta mouse model that was obtained by crossing APP23 (overexpressing human amyloid precursor protein with the Swedish double mutation at position 670/671) with PS45 (carrying a transgene for mutant Presenilin 1, G384A mutation) (Busche et al., 2008; Grienberger et al., 2012). 

      (2) Line 148: The authors should also briefly describe in the main text that APP/PS1 x SST-Cre mice were generated and used here.  

      We thank the reviewer for their comment and have added their suggestion to the main text (line 166-168):

      ‘To do this, APP/PS1 mice were crossed with SST-Cre mice to generate APP/PS1 SST-Cre mice. Following microinjection of AAV-hSyn::DIO-mCherry into the mPFC, recordings were obtained from SST neurons.’

      (3) The discussion should be condensed because of redundancies on several occasions. For example, memory allocation is discussed starting on line 371, then again on line 392. This should be combined. Likewise, how the correlative nature of the findings about PV interneurons could be further functionally addressed is discussed on lines 413 and 454, and should be condensed into one paragraph. 

      We thank the reviewer for this suggestion and have revised the discussion to remove the redundancies as proposed.  

      Reviewer #2 (Recommendations for the authors): 

      To strengthen the manuscript, the following points should be addressed: 

      (1) Quantify amyloid pathology: It is essential to assess amyloid-β levels (soluble and insoluble) in the mPFC of APP/PS1-PV-Cre-tdTomato mice at the studied ages. This would help determine whether the observed circuitlevel changes track with disease progression as seen in canonical APP/PS1 models. 

      We thank the reviewer for this valuable suggestion and agree that assessing Aβ levels in the mPFC is important to determine whether the observed circuit level alterations in APP/PS1 mice coincide with the progression of amyloid pathology. Therefore, we assessed the amyloid plaque load in the mPFC of APP/PS1 mice at 16 and 20 weeks of age (new supplemental figure sFig. 1) and observed no difference in plaque load between these two time points. This suggests that the increased excitability in the mPFC cannot be attributed to differences in plaque load (insoluble amyloid beta).

      In line with this, we previously studied both soluble and insoluble Aβ levels in the CA1 and reported that there are no differences between 12 and 16 weeks of age (Kater et al., 2023), while PV cell hyperexcitability is present at 16 weeks of age (Hijazi et al., 2020a). From 24 weeks onwards, the level of amyloid beta increases. Similarly, Végh et al. (2014) showed using immunoblotting that monomeric and low molecular weight oligomeric forms of soluble Aβ are already present as early as 6 weeks of age and become more prominent at 24 weeks of age. Although the soluble Aβ measurements were performed in the hippocampus, we think these findings can be extrapolated to cortical regions, as the APP and PS1 mutations in APP/PS1 mice are driven by a prion promotor, which should induce consistent expression across brain regions. Data from other research groups support this hypothesis (Kim et al., 2015; Zhang et al., 2011). Thus, large regional differences in soluble Aβ are not expected. The temporal progression suggests that increasing levels of soluble amyloid beta might contribute to the emergence of PV cell hyperexcitability. We have added this point to the manuscript (line 585-591):

      ‘Since amyloid beta plaque load in the mPFC remains comparable between 16- and 20-week-old APP/PS1 mice, the observed increased excitability is unlikely the result of changes in insoluble amyloid beta levels. Previous data from our lab show that soluble amyloid beta is already present as early as 6 weeks of age and becomes more prominent at 24 weeks of age (Kater et al., 2023; Végh et al., 2014). The progressive increase in soluble amyloid beta levels may contribute to the emergence of PV cell hyperexcitability.’

      Finally, we previously compared soluble and insoluble amyloid beta levels in APP/PS1 and APP/PS1 Parv Cre mice and show that these are similar (Hijazi et al., 2020a). While our current study shows the progression of amyloid beta accumulation in APP/PS1 mice, these mice also exhibit altered microcircuitry (enhanced sIPSC frequency on engram cells) at 20 weeks of age, the same age at which we observed PV cell hyperexcitability in APP/PS1 Parv Cre tdTomato mice. This further supports the generalizability of our findings across genotypes, between APP/PS1 and APP/PS1 Parv Cre tdTomato mice. 

      (2) Examine later disease stages: Since the current effects are modest, assessing memory performance, PV cell excitability, and engram inhibition at more advanced stages could clarify whether these alterations become more pronounced with disease progression. 

      We thank the reviewer for this thoughtful suggestion. Investigating advanced disease stages could indeed provide valuable insights into whether the observed alterations in memory performance, PV cell hyperexcitability and engram inhibition become more pronounced over time. Our previous work has shown that changes in pyramidal cell excitability emerge at a later stage than in PV cells, supporting the idea of progressive circuit dysfunction (Hijazi et al., 2020a). However, at these more advanced stages, additional pathological processes, such as an increased gliosis (Janota, Brites, Lemere, & Brito, 2015; Kater et al., 2023) and synaptic loss (Alonso-Nanclares, MerinoSerrais, Gonzalez, & DeFelipe, 2013; Bittner et al., 2012), will likely contribute to both electrophysiological and behavioural measurements. Furthermore, we would like to point out that the current changes observed in memory performance, PV hyperexcitability and increased inhibitory input on engram cells at 16-20 weeks of age are not modest, but already quite substantial. Our focus on these early time points in APP/PS1 mice were intentional, as it helps us understand the initial changes in Alzheimer’s disease at a circuit level and to identify therapeutic targets early intervention. What happens at later stages is certainly of interest, but beyond the scope of this study and should therefore be addressed in future studies. We have incorporated a discussion related to this point into the revised manuscript (line 602-606):

      ‘Moreover, it is relevant to investigate whether changes in PV and PYR cell excitability, as well as input onto engram cells in the mPFC, become more pronounced at later disease stages. Nonetheless, by focussing on early disease timepoints in the present study, we aimed to understand the initial circuit-level changes in AD and identify targets for early therapeutic intervention.’

      (3) Address network hyperexcitability: Spontaneous epileptiform activity has been reported in APP/PS1 mice from 4 months of age (Reyes-Marin & Nuñez, 2017). Including EEG data or discussing this point in relation to your findings would help contextualize the observed inhibitory remodeling within broader network dysfunction. 

      We thank the reviewer for this valuable input and for highlighting the study by Reyes-Marin and Nuñez (2017). In line with this, we recently reported longitudinal local field potential (LFP) recordings in freely behaving APP/PS1 Parv-Cre mice and wild type control animals between the ages of 3 to 12 months (van Heusden et al., 2023). Weekly recordings were performed in the home cage under awake mobile conditions. These data showed no indications of epileptiform activity during wakefulness, consistent with previous findings that epileptic discharges in APP/PS1 mice predominantly occur during sleep (Gureviciene et al., 2019). Recordings were obtained from the prefrontal cortex (PFC), parietal cortex and the hippocampus. In contrast, the study by Reyes-Marin and Nuñez (2017) recorded from the somatosensory cortex in anesthetized animals. Here, during spontaneous recordings, no differences were observed in delta, theta or alpha frequency bands between APP/PS1 and WT mice. Interestingly, we observed an early increase in absolute power, particularly in the hippocampus and parietal cortex from 12 to 24 weeks of age in APP/PS1 mice. In the PFC we found a shift in relative power from lower to higher frequencies and a reduction in theta power. Connectivity analyses revealed a progressive, age-dependent decline in theta/alpha coherence between the PFC and both the parietal cortex and hippocampus. Given the well-established role of PV interneurons network synchrony and coordinating theta and gamma oscillations critical for cognitive function (Sohal, Zhang, Yizhar, & Deisseroth, 2009; Xia et al., 2017), these findings support the idea of early circuit dysfunction in APP/PS1 mice. Our findings, i.e. hyperexcitability of PV cells, align with these LFP based networklevel observations. These data suggest an early shift in the E/I balance, contributing to altered oscillatory dynamics and impaired inter-regional connectivity, possibly leading to alterations in memory. However, whether the observed PV hyperexcitability in our study directly contributes to alterations in power and synchrony remains to be elucidated. Furthermore, it would be interesting to determine the individual contribution of PV cell hyperexcitability in the hippocampus versus the mPFC to network changes and concurrent memory deficits. We have added a statement on network hyperexcitability to the discussion (line 561-565). 

      ‘Interestingly, we recently found a progressive disruption of oscillatory network synchrony between the mPFC and hippocampus in APP/PS1 Parv-Cre mice (van Heusden et al., 2023). However, whether the observed PV cell hyperexcitability directly contributes to changes in inter-regional synchrony, and whether this leads to alterations at a network level, i.e. increased inhibitory input on engram cells, and consequently to memory deficits, remains to be elucidated in future studies.’ 

      (4) Mechanisms responsible for PV hyperexcitability: Related to the previous point, a discussion of the possible underlying mechanisms, e.g., direct effects of amyloid-β, inflammatory processes, or compensatory mechanisms, would strengthen the discussion. 

      We agree with the reviewer that this will strengthen the discussion. We have now added a comprehensive discussion in the revised manuscript to address potential mechanisms responsible for PV cell hyperexcitability (line 579-594).:

      ‘Prior studies have shown that neurons in the vicinity of amyloid beta plaques show increased excitability (Busche et al., 2008). We demonstrated that PV neurons in the CA1 are hyperexcitable and that treatment with a BACE1 inhibitors, i.e. reducing amyloid beta levels, rescues PV excitability (Hijazi et al., 2020a). In line with this, we also reported that addition of amyloid beta to hippocampal slices increases PV excitability, without altering pyramidal cell excitability (Hijazi et al., 2020a). Finally, applying amyloid beta to an induced mouse model of PV hyperexcitability further impairs PV function (Hijazi et al., 2020b). Since amyloid beta plaque load in the mPFC remains comparable between 16- and 20-week-old APP/PS1 mice, the observed increased excitability is unlikely the result of changes in insoluble amyloid beta levels. Previous data from our lab show that soluble amyloid beta is already present as early as 6 weeks of age and becomes more prominent at 24 weeks of age (Kater et al., 2023; Végh et al., 2014). The progressive increase in soluble amyloid beta levels may contribute to the emergence of PV cell hyperexcitability. We hypothesize that the hyperexcitability induced by amyloid beta may result from disrupted ion channel function, as PV neuron dysfunction can result from altered potassium (Olah et al., 2022) and sodium channel activity (Verret et al., 2012).’

      (5) Excitatory-inhibitory balance: While the main focus is on increased inhibition onto engram cells, the reported increase in sEPSC frequency (Figure 5g) across genotypes suggests the presence of excitatory remodelling as well. A brief discussion of how this may interact with increased inhibition would be valuable.  

      We thank the reviewer for this comment regarding the interaction between excitatory and inhibitory remodelling. We have now incorporated this discussion point into the revised manuscript (line 528-534):

      ‘Interestingly, both WT and APP/PS1 mice showed an increase in sEPSC frequency onto engram cells, suggesting that increased excitatory input is a consequence of memory retrieval and not affected by genotype. However, only in APP/PS1 mice, the augmented excitatory input coincided with an elevation of inhibitory input onto engram cells. The resulting imbalance between excitation and inhibition could therefore potentially disrupt the precise control of engram reactivation and contribute to the observed remote memory impairment.’

      References

      Alonso-Nanclares, L., Merino-Serrais, P., Gonzalez, S., & DeFelipe, J. (2013). Synaptic changes in the dentate gyrus of APP/PS1 transgenic mice revealed by electron microscopy. J Neuropathol Exp Neurol, 72(5), 386-395. doi:10.1097/NEN.0b013e31828d41ec

      Bittner, T., Burgold, S., Dorostkar, M. M., Fuhrmann, M., Wegenast-Braun, B. M., Schmidt, B., . . . Herms, J. (2012). Amyloid plaque formation precedes dendritic spine loss. Acta Neuropathologica, 124(6), 797807. doi:10.1007/s00401-012-1047-8

      Busche, M. A., Eichhoff, G., Adelsberger, H., Abramowski, D., Wiederhold, K. H., Haass, C., . . . Garaschuk, O. (2008). Clusters of hyperactive neurons near amyloid plaques in a mouse model of Alzheimer's disease. Science, 321(5896), 1686-1689. doi:10.1126/science.1162844

      Grienberger, C., Rochefort, N. L., Adelsberger, H., Henning, H. A., Hill, D. N., Reichwald, J., . . . Konnerth, A. (2012). Staged decline of neuronal function in vivo in an animal model of Alzheimer's disease. Nat Commun, 3, 774. doi:10.1038/ncomms1783

      Gureviciene, I., Ishchenko, I., Ziyatdinova, S., Jin, N., Lipponen, A., Gurevicius, K., & Tanila, H. (2019). Characterization of Epileptic Spiking Associated With Brain Amyloidosis in APP/PS1 Mice. Front Neurol, 10, 1151. doi:10.3389/fneur.2019.01151

      Hijazi, S., Heistek, T. S., Scheltens, P., Neumann, U., Shimshek, D. R., Mansvelder, H. D., . . . van Kesteren, R. E. (2020a). Early restoration of parvalbumin interneuron activity prevents memory loss and network hyperexcitability in a mouse model of Alzheimer's disease. Mol Psychiatry, 25(12), 3380-3398. doi:10.1038/s41380-019-0483-4

      Hijazi, S., Heistek, T. S., van der Loo, R., Mansvelder, H. D., Smit, A. B., & van Kesteren, R. E. (2020b). Hyperexcitable Parvalbumin Interneurons Render Hippocampal Circuitry Vulnerable to Amyloid Beta. iScience, 23(7), 101271. doi:10.1016/j.isci.2020.101271

      Janota, C. S., Brites, D., Lemere, C. A., & Brito, M. A. (2015). Glio-vascular changes during ageing in wild-type and Alzheimer's disease-like APP/PS1 mice. Brain Res, 1620, 153-168. doi:10.1016/j.brainres.2015.04.056

      Kater, M. S. J., Huffels, C. F. M., Oshima, T., Renckens, N. S., Middeldorp, J., Boddeke, E., . . . Verheijen, M. H. G. (2023). Prevention of microgliosis halts early memory loss in a mouse model of Alzheimer's disease. Brain Behav Immun, 107, 225-241. doi:10.1016/j.bbi.2022.10.009

      Kim, H. Y., Kim, H. V., Jo, S., Lee, C. J., Choi, S. Y., Kim, D. J., & Kim, Y. (2015). EPPS rescues hippocampus-dependent cognitive deficits in APP/PS1 mice by disaggregation of amyloid-β oligomers and plaques. ature Communications, 6(1), 8997. doi:10.1038/ncomms9997

      Olah, V. J., Goettemoeller, A. M., Rayaprolu, S., Dammer, E. B., Seyfried, N. T., Rangaraju, S., . . . Rowan, M. J. M. (2022). Biophysical Kv3 channel alterations dampen excitability of cortical PV interneurons and contribute to network hyperexcitability in early Alzheimer’s. Elife, 11, e75316. doi:10.7554/eLife.75316

      Reyes-Marin, K. E., & Nuñez, A. (2017). Seizure susceptibility in the APP/PS1 mouse model of Alzheimer's disease and relationship with amyloid β plaques. Brain Res, 1677, 93-100. doi:10.1016/j.brainres.2017.09.026

      Sohal, V. S., Zhang, F., Yizhar, O., & Deisseroth, K. (2009). Parvalbumin neurons and gamma rhythms enhance cortical circuit performance. Nature, 459(7247), 698-702. doi:10.1038/nature07991

      van Heusden, F. C., van Nifterick, A. M., Souza, B. C., França, A. S. C., Nauta, I. M., Stam, C. J., . . . van Kesteren, R. E. (2023). Neurophysiological alterations in mice and humans carrying mutations in APP and PSEN1 genes. Alzheimers Res Ther, 15(1), 142. doi:10.1186/s13195-023-01287-6

      Végh, M. J., Heldring, C. M., Kamphuis, W., Hijazi, S., Timmerman, A. J., Li, K. W., . . . van Kesteren, R. E. (2014). Reducing hippocampal extracellular matrix reverses early memory deficits in a mouse model of Alzheimer's disease. Acta Neuropathol Commun, 2, 76. doi:10.1186/s40478-014-0076-z

      Verret, L., Mann, E. O., Hang, G. B., Barth, A. M., Cobos, I., Ho, K., . . . Palop, J. J. (2012). Inhibitory interneuron deficit links altered network activity and cognitive dysfunction in Alzheimer model. Cell, 149(3), 708-721. doi:10.1016/j.cell.2012.02.046

      Xia, F., Richards, B. A., Tran, M. M., Josselyn, S. A., Takehara-Nishiuchi, K., & Frankland, P. W. (2017). Parvalbumin-positive interneurons mediate neocortical-hippocampal interactions that are necessary for memory consolidation. Elife, 6. doi:10.7554/eLife.27868

      Zhang, W., Hao, J., Liu, R., Zhang, Z., Lei, G., Su, C., . . . Li, Z. (2011). Soluble Aβ levels correlate with cognitive deficits in the 12-month-old APPswe/PS1dE9 mouse model of Alzheimer's disease. Behavioural Brain Research, 222(2), 342-350. doi:https://doi.org/10.1016/j.bbr.2011.03.072

    1. eLife Assessment

      This study presents a valuable finding on a new role of glia in activity-dependent synaptic remodeling using the Drosophila NMJ as a model system. The evidence supporting the claims of the authors is convincing. The authors have addressed most of the reviewers' concerns and help to further clarify the claims. The work will be of interest to neuroscientists working on glia-neuron interaction and synaptic remodeling.

    2. Reviewer #2 (Public review):

      In this paper Chang et al follow up on their lab's previous findings about the secreted protein Shv and its role in activity-induced synaptic remodeling at the fly NMJ. Previously they reported that shv mutants have impaired synaptic plasticity. Normally a high stimulation paradigm should increase bouton size and GluR expression at synapses but this does not happen in shv mutants. The phenotypes relating to activity-dependent plasticity were completely recapitulated when Shv was knocked down only in neurons and could be completely rescued by incubation in exogenously applied Shv protein. The authors also showed that Shv activation of integrin signaling on both the pre- and post-synapse was the molecular mechanism underlying its function in plasticity. Here they extend their study to consider a role of Shv derived from glia in modulating synaptic features at baseline and remodeling conditions. The authors show evidence that Shv is expressed in both neurons and glia. Despite the fact that neuron-specific RNAi knockdown of Shv recapitulated the plasticity phenotypes seen in whole animal mutants, the authors asked whether glial-specific knockdown would have any effects. Surprisingly, knockdown of Shv only in glia also blocked plasticity, just like neuron-specific knockdown, and supporting an important role for glial-derived Shv in plasticity. Unlike neuronal knockdown, though, glial knockdown also caused abnormally high baseline GluR expression. Restoring Shv in ONLY glia in mutant animals is sufficient to completely rescue the plasticity phenotypes and baseline GluR expression, but glial-Shv does not appear to activate integrin signaling which was shown to be the mechanism for neuronally derived Shv to control plasticity. This suggests a different or indirect mechanism of action for glial-derived Shv. This led the authors to hypothesize that glial Shv might work via controlling the levels of neuronal Shv and/or extracellular glutamate. To test these hypotheses, they provide evidence that in the absence of glial Shv, synaptic levels of Shv go up overall, suggesting that glial Shv could somehow have a suppressive effect on release of neuronal Shv. This would indirectly modulate integrin signaling to control plasticity. Using an extracelluar glutamate sensor in presynaptic boutons, they also observe decreased signal (extracellular glutamate) from the sensor in glial Shv KD animals, and increased signal in glial Shv overexpression animals, supporting the hypothesis that glial Shv can regulate glutamate levels somehow. These results establish glia as an important source of Shv in these processes and identify some mechanisms for how this might be accomplished. Several outstanding questions remain-most importantly: how/why do glial-derived and neuronal-derived Shv have different effects when in the same space? No obvious isoform or size differences were found, and the same rescue construct expressed either in neurons or glia could have different effects on integrin activation or glutamate levels. Answering these questions using modified rescue constructs will be an important future direction to understand Shv function specifically and how neurons and glia work together in this context--and potentially many other contexts.

      Comments on revisions:

      The authors addressed my and the other reviewers' concerns from the original review adequately and this has strengthened the paper substantially.

      One small omission to correct: In Figures 4 and 6, the graphs in the figures do not have a legend for the colored bars.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides compelling evidence that glia-derived Shriveled (Shv) modulates activity-dependent synaptic plasticity at the Drosophila neuromuscular junction (NMJ). This mechanism differs from the previously reported function of neuronally released Shv, which activates integrin signaling. They further show that this requirement of Shv is acute and that glial Shv supports synaptic plasticity by modulating neuronal Shv release and the ambient glutamate levels. However, there are a number of conceptual and technical issues that need to be addressed.

      Major comments

      (1) From the images provided for Fig 2B +RU486, the bouton size appears to be bigger in shv RNAi + stimulation, especially judging from the outline of GluR clusters.

      (2) The shv result needs to be replicated with a separate RNAi.

      (3) The phenotype of shv mutant resembles that of neuronal shv RNAi - no increased GluR baseline. Any insights why that is the case?

      (4) In Fig 3B, SPG shv RNAi has elevated GluR baseline, while PG shv RNAi has a lower baseline. In both cases, there is no activity induced GluR increase. What could explain the different phenotypes?

      (5) In Fig 4C, the rescue of PTP is only partial. Does that suggest neuronal shv is also needed to fully rescue the deficit of PTP in shv mutants?

      (6) The observation in Fig 5D is interesting. While there is a reduction in Shv release from glia after stimulation, it is unclear what the mechanism could be. Is there a change in glial shv transcription, translation or the releasing machinery? It will be helpful to look at the full shv pool vs the released ones.

      (7) In Fig 5E, what will happen after stimulation? Will the elevated glial Shv after neuronal shv RNAi be retained in the glia?

      (8) It would be interesting to see if the localization of shv differs based on if it is released by neuron or glia, which might be able to explain the difference in GluR baseline. For example, by using glia-Gal4>UAS-shv-HA and neuronal-QF>QUAS-shv-FLAG. It seems important to determine if they mix together after release? It is unclear if the two shv pools are processed differently.

      (9) Alternatively, do neurons and glia express and release different Shv isoforms, which would bind different receptors?

      (10) It is claimed that Sup Fig 2 shows no observable change in gross glial morphology, further bolstering support that glial Shv does not activate integrin. This seems quite an overinterpretation. There is only one image for each condition without quantification. It is hard to judge if glia, which is labeled by GFP (presumably by UAS-eGFP?), is altered or not.

      (11) The hypothesis that glutamate regulates GluR level as a homeostatic mechanism makes sense. What is the explanation of the increased bouton size in the control after glutamate application in Fig 6?

      (12) What could be a mechanism that prevents elevated glial released Shv to activate integrin signaling after neuronal shv RNAi, as seen in Fig 5E?

      (13) Any speculation on how the released Shv pool is sensed?

      Comments on revisions:

      The authors have addressed most of my previous comments and questions in their revision.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In this manuscript, Chang et al. investigated the cell type-specific role of the integrin activator Shv in activity-dependent synaptic remodeling. Using the Drosophila larval neuromuscular junction as a model, they show that glial-secreted Shv modulates synaptic plasticity by maintaining the extracellular balance of neuronal Shv proteins and regulating ambient extracellular glutamate concentrations, which in turn affects postsynaptic glutamate receptor abundance. Furthermore, they report that genetic perturbation of glial morphogenesis phenocopies the defects observed with the loss of glial Shv. Altogether, their findings propose a role for glia in activity-induced synaptic remodeling through Shv secretion. While the conclusions are intriguing, several issues related to experimental design and data interpretation merit further discussion.

      We appreciate the insightful and constructive comments. We have added new data and modified the text to address your concerns.  In doing so, the manuscript has been substantially strengthened.  Please see our detailed point-by-point response below. 

      Reviewer #2 (Public review):

      In this paper Chang et al follow up on their lab's previous findings about the secreted protein Shv and its role in activity-induced synaptic remodeling at the fly NMJ. Previously they reported that shv mutants have impaired synaptic plasticity. Normally a high stimulation paradigm should increase bouton size and GluR expression at synapses but this does not happen in shv mutants. The phenotypes relating to activity dependent plasticity were completely recapitulated when Shv was knocked down only in neurons and could be completely rescued by incubation in exogenously applied Shv protein. The authors also showed that Shv activation of integrin signaling on both the pre- and post- synapse was the molecular mechanism underlying its function. Here they extend their study to consider the role of Shv derived from glia in modulating synaptic features at baseline and remodeling conditions. This study is important to understand if and how glia contribute to these processes. Using cell-type specific knockdown of Shv only in glia causes abnormally high baseline GluR expression and prevents activity-dependent increases in bouton size or GluR expression post-stimulation. This does not appear to be a developmental defect as the authors show that knocking down Shv in glia after basic development has the same effects as lifelong knockdown, so Shv is acting in real time. Restoring Shv in ONLY glia in mutant animals is sufficient to completely rescue the plasticity phenotypes and baseline GluR expression, but glial-Shv does not appear to activate integrin signaling which was shown to be the mechanism for neuronally derived Shv to control plasticity. This led the authors to hypothesize that glial Shv works by controlling the levels of neuronal Shv and extracellular glutamate. They provide evidence that in the absence of glial Shv, synaptic levels of Shv go up overall, presumably indicating that neurons secrete more Shv. In this context which could then work via integrin signaling as described to control plasticity. They use a glutamate sensor and observe decreased signal (extracellular glutamate) from the sensor in glial Shv KD animals, however, this background has extremely high GluR levels at the synapse which may account for some or all of the decreases in sensor signal in this background. Additional controls to test if increased GluR density alone affects sensor readouts and/or independently modulating GluR levels in the glial KD background would help strengthen this data. In fact, glialspecific shv KD animals have baseline levels of GluR that are potentially high enough to have hit a ceiling of expression or detection that accounts for the inability for these levels to modulate any higher after strong stimulation and such a ceiling effect should be considered when interpreting the data and conclusions of this paper. Several outstanding questions remain-why can't glial derived Shv activate integrin pathways but exogenously applied recombinant Shv protein can? The effects of neuronal specific rescue of shv in a shv mutant are not provided vis-à-vis GluR levels and bouton size to compare to the glial only rescue. Inclusion of this data might provide more insight to outstanding questions of how and why the source of Shv seems to matter for some aspects of the phenotypes but not others despite the fact that exogenous Shv can rescue and in some experimental paradigms but not others.

      We appreciate your insightful comments. We have added new data and modified the text to address your concerns.  In doing so, the manuscript has been substantially strengthened.  Please also see the enclosed point-by-point response.

      To address the question of whether altered GluR density alone affects sensor readouts, we expressed GluR using a mhc promoter-driven GluRIIA fusion line, which increases total GluRIIA expression in muscle independently of the Gal4/UAS system. As shown in Figure 6 – figure supplement 1, mhc-GluRIIA animals exhibited elevated levels of not only GluRIIA but also the obligatory GluRIIC subunit. Despite this increase in GluR expression, we did not observe any change in extracellular glutamate levels, as measured by live imaging using the neuronal iGluSnFR sensor (updated Figure 6A). These results suggest that elevated GluR density alone does not alter iGluSnFR sensors dynamics and further support our conclusions.

      In regard to the question about ceiling effect, we do not think that the lack of GluR enhancement in repo>shv-RNAi is due to a saturated postsynaptic state. This is based on results in Figure 6, which shows that GluR levels can increase up to fourfold upon stimulation in the presence of glutamate, whereas repo>shv-RNAi results in only a ~2-fold increase in baseline GluR concentration. These results suggest that the synapse retains the capacity for further upregulation. 

      To address the question of why exogenously applied Shv activates integrin while glial derived Shv does not, we tested whether glia and neurons could differentially modify Shv. Based on Western blot analyses of adult heads and larval brains showing that Shv is present as a single band (Fig. 1A and Figure 2 – figure supplement 1B), the functional differences in neuronal or glial Shv is not likely due to the presence of different isoforms. Consistent with this, FlyBase also suggests that shv encodes a single isoform. However, while we did not detect obvious posttranslational modifications when Shv protein was expressed in neurons or glia (Figure 5 – figure supplement 1A), we cannot exclude the possibility that different cell types process Shv differently through post-transcriptional or post-translational mechanisms. Notably, shv is predicted to undergo A-to-I RNA editing, including an editing site in the coding region, which will result in a single amino acid change (St Laurent et al., 2013). Given that ADAR, the editing enzyme, is enriched in neurons and absent from glia (Jepson et al., 2011), such cell-specific editing could contribute to functional differences. It will be interesting to investigate this in the future. We have now included this in the Discussion section.

      Additionally, we have now included new data on neuronal Shv rescue of shv<sup>1</sup> mutants as suggested in the updated Figure 4. Consistent with previous findings that neuronal Shv rescues integrin signaling and electrophysiological phenotypes (Lee et al., 2017), we found that it also restores bouton size, GluR levels, and activity-induced synaptic remodeling. These results support the functional contribution of neuronal Shv. 

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides compelling evidence that glia-derived Shriveled (Shv) modulates activity-dependent synaptic plasticity at the Drosophila neuromuscular junction (NMJ). This mechanism differs from the previously reported function of neuronally released Shv, which activates integrin signaling. They further show that this requirement of Shv is acute and that glial Shv supports synaptic plasticity by modulating neuronal Shv release and the ambient glutamate levels. However, there are a number of conceptual and technical issues that need to be addressed.

      We appreciate the insightful and constructive comments. We have added new data and modified the text to address your concerns.  In doing so, the manuscript has been substantially strengthened.  Please see our detailed point-by-point response below.

      Major comments:

      (1) From the images provided for Fig 2B +RU486, the bouton size appears to be bigger in shv RNAi + stimulation, especially judging from the outline of GluR clusters.

      Thank you for pointing this out. We have selected another image to better represent the data.

      (2) The shv result needs to be replicated with a separate RNAi.

      We have used another independent RNAi line targeting shv to confirm our findings (BDSC 37507). This shv-RNAi<sup>37507</sup> line also showed the same phenotype, including increased GluR levels and impaired activity-induced synaptic remodeling line (new Figure 2 – figure supplement 1A).

      (3) The phenotype of shv mutant resembles that of neuronal shv RNAi - no increased GluR baseline. Any insights why that is the case?

      This is an interesting question. We speculate that neuronal Shv normally has a dominant role in maintaining GluR levels during development, mainly through its ability to activate integrin signaling. Consistent with this, we have shown that mutations in integrin leads to a drastic reduction in GluR levels at the NMJ (Lee et al., 2017). While we have shown that neuronal knockdown of shv elevates Shv from glia (Fig. 5E), glial Shv cannot activate integrin signaling (Fig. 5B, 5C). Additionally, high levels of glial Shv will elevate ambient glutamate concentrations (Figure 6A), which will likely reduce GluR abundance and impair synaptic remodeling (Augustin et al.  2007, Chen et al., 2009, and Figure 6B). Therefore, neuronal knockdown of Shv resulted in the same phenotype as shv<sup>1</sup> mutant. 

      (4) In Fig 3B, SPG shv RNAi has elevated GluR baseline, while PG shv RNAi has a lower baseline. In both cases, there is no activity induced GluR increase. What could explain the different phenotypes?

      SPG is the middle glial cell layer in the fly peripheral nervous system and may also influence the PG layer through signaling mechanisms (Lavery et al., 2007), therefore having a stronger effect. We have now mentioned this in the text. 

      (5) In Fig 4C, the rescue of PTP is only partial. Does that suggest neuronal shv is also needed to fully rescue the deficit of PTP in shv mutants?

      This is indeed a possibility. We have shown that neuronal and glial Shv each contribute to activity-induced synaptic remodeling through different mechanisms. It will be interesting test this in the future.

      (6) The observation in Fig 5D is interesting. While there is a reduction in Shv release from glia after stimulation, it is unclear what the mechanism could be. Is there a change in glial shv transcription, translation or the releasing machinery? It will be helpful to look at the full shv pool vs the released ones. 

      Thank you for the suggestion. To address this, we monitored the levels of intracellular Shv using a permeabilized preparation (we found that the addition of detergent to permeabilize the sample strips away extracellular Shv). Combined with the extracellular staining results, we can get an idea about the total amount of Shv. As shown in the updated Figure 5D, intracellular Shv levels (permeabilized) remained unchanged following stimulation, indicating that there is no intracellular accumulation and that the observed decrease in extracellular Shv is unlikely due to impaired release machinery.

      (7) In Fig 5E, what will happen after stimulation? Will the elevated glial Shv after neuronal shv RNAi be retained in the glia? 

      Thank you for the interesting question. We agree that examining Shv distribution following neuronal activity would be highly informative. While we plan to perform time-lapse experiments in future studies to address this, we feel that such analyses are beyond the scope of the current manuscript.

      (8) It would be interesting to see if the localization of shv differs based on if it is released by neuron or glia, which might be able to explain the difference in GluR baseline. For example, by using glia-Gal4>UAS-shv-HA and neuronal-QF>QUAS-shv-FLAG. It seems important to determine if they mix together after release? It is unclear if the two shv pools are processed differently.

      We agree that investigating whether neuronal and glial shv pools colocalize or are differentially processed is an important future direction. We hope to examine how each pool responds to stimulation in the shv<sup>1</sup> mutant background using LexA and Gal4 systems in the future

      (9) Alternatively, do neurons and glia express and release different Shv isoforms, which would bind different receptors?

      Thank you for the questions. We have now addressed this in the discussion and also enclosed below:

      Based on Western blot analyses of adult heads and larval brains showing that Shv is present as a single band (Fig. 1A and Figure 2 – figure supplement 1B), the functional differences in neuronal or glial Shv is not likely due to the presence of different isoforms. Consistent with this, FlyBase also suggests that shv encodes a single isoform (Ozturk-Colak et al., 2024). However, while we did not detect obvious post-translational modifications when Shv protein was expressed in neurons or glia (Figure 5 – figure supplement 1A), we cannot exclude the possibility that different cell types process Shv differently through posttranscriptional or post-translational mechanisms. Notably, shv is predicted to undergo A-to-I RNA editing, including an editing site in the coding region, which could result in a single amino acid change (St Laurent et al., 2013). Given that ADAR, the editing enzyme, is enriched in neurons and absent from glia (Jepson et al., 2011), such cell-specific editing could contribute to functional differences. It will be interesting to investigate this in the future.

      (10) It is claimed that Sup Fig 2 shows no observable change in gross glial morphology, further bolstering support that glial Shv does not activate integrin. This seems quite an overinterpretation. There is only one image for each condition without quantification. It is hard to judge if glia, which is labeled by GFP (presumably by UAS-eGFP?), is altered or not.

      Thank you for raising this concern. To strengthen our claim, we now include additional images (Figure 5, figure supplement 2). No obvious change in overall glial morphology was observed, with glia continuing to wrap the segmental nerves and extend processes that closely associate with proximal synaptic boutons (Figure 5, figure supplement 2). These observations suggest that glial  Shv is not essential for maintaining normal glial structure or survival, and is consistent with the idea that glial Shv does not activate integrin, as integrin signaling is required to maintain the integrity of peripheral glial layers. 

      (11) The hypothesis that glutamate regulates GluR level as a homeostatic mechanism makes sense. What is the explanation of the increased bouton size in the control after glutamate application in Fig 6?

      We speculate that it could be due to a retrograde signaling mechanism activated by elevated extracellular glutamate, allowing neurons to modulate bouton morphology in response to synaptic demand. It will be interesting to investigate this possibility in the future.  

      (12) What could be a mechanism that prevents elevated glial released Shv to activate integrin signaling after neuronal shv RNAi, as seen in Fig 5E?

      One potential mechanism is post-translational or post-transcriptional processing of Shv. Although our Western blots did not reveal differences in the molecular weight of glial vs. neuronal Shv, we cannot exclude the possibility that modifications not readily detectable by this method are responsible. Additionally, as mentioned in the Discussion section, post-transcriptional processing such as A-to-I RNA editing could introduce changes in the Shv protein, potentially altering its ability to interact with or activate integrin. 

      (13) Any speculation on how the released Shv pool is sensed?

      The same RNA editing modification mentioned earlier or post-translational modifications in Shv may also influence how it is sensed by target cells. 

      Reviewer #1 (Recommendations for the authors):

      Issues Regarding Cell Type-Specific Secretion and the Role of Shv:

      Extracellular Secretion of Shv:

      (1) The data in Figure 1 suggest that Shv is not secreted under resting conditions, challenging the proposed extracellular role of Shv. It remains unclear whether Shv secretion can be confirmed using Shv-eGFP (knock-in) following high K+ stimulation.

      We apologize for not being clear. In Figure 1, Shv signals we’ve shown are from permeabilized preparation, which preferentially labels intracellular Shv. We do observe secreted Shv-eGFP following stimulation (Figure 5E), consistent with our hypothesis. However, endogenous extracellular Shv-eGFP signal is very weak, and was therefore detected using the GFP antibody and amplified with a  fluorescent secondary antibody. We have now also included additional controls in Figure 5E to demonstrate the specificity of the staining.

      (2) In Figure 5D, total Shv staining should be included to evaluate potential presynaptic accumulation of intracellular Shv, which may lead to extracellular secretion upon stimulation. Additionally, the representative images of glial rescue do not seem to align with the quantification data; more extracellular Shv signals were observed after stimulation.

      Thank you for the comments. We monitored the levels of intracellular Shv using a permeabilized preparation (detergent treatment stripped away extracellular Shv signal). When combined with non-permeabilized extracellular staining, this approach provides insights into total Shv levels. We found no intracellular accumulation of Shv and the intracellular levels remained unchanged following stimulation (updated Figure 5D), suggesting that reduced extracellular Shv is not likely due to impaired release. Additionally, we have selected another image for glial rescue by avoiding the trachea region, which better represent the quantification data.

      (3) In Figure 5E, "extracellular" Shv staining in repo>shv-RNAi samples appears localized within synaptic boutons. This raises concerns about the staining protocol potentially labeling intracellular proteins. Control experiments using presynaptic cytosolic markers are needed to confirm staining specificity.

      Thank you for the thoughtful suggestion. To validate that our staining protocol is selective for extracellular proteins, we also stained for cysteine string protein (CSP), an intracellular synaptic vesicle protein predominantly located in the presynaptic terminals (Zinsmaier et al., 1990; Umbach et al., 1994), under the same conditions. CSP was detected only in the permeabilized condition (updated Figure 5E), suggesting that the non-permeabilizing protocol is selective for extracellular proteins. 

      (4) The study does not clarify why Shv knockdown in either perineurial glia or subperineurial glia abolishes stimulus-dependent synaptic remodeling. Does Shv secretion occur from PG, SPG, or both toward the synaptic bouton?

      Thank you for raising this point. SPG is the middle glial cell layer in the fly peripheral nervous system and may also influence the PG layer through signaling mechanisms (Lavery et al., 2007). Consistent with this, we observed a stronger effect on GluR levels when SPG was disrupted compared to PG. It will be interesting to distinguish whether Shv is released by PG or SPG in the future.

      (5) The possibility of an inter-glial role for Shv via integrin signaling in regulating glial morphogenesis is underexplored. The rough morphological characterization in Supplemental Figure 2 requires more detailed quantification and the use of sub-glial typespecific GAL4 drivers.

      We now include additional images (Figure 5, figure supplement 2) to examine the overall glial morphology. There was no obvious change in gross glial morphology, with glia continuing to wrap the segmental nerves and extend processes that closely associate with proximal synaptic boutons when shv is knocked down in glia (Figure 5, figure supplement 2). These observations suggest that glial  Shv is not essential for maintaining normal glial structure or survival, and is consistent with the idea that glial Shv does not activate integrin, as integrin signaling is required to maintain the integrity of peripheral glial layers (Xie and Auld, 2011; Hunter et al., 2020).

      (6) While repo>shv rescues stimulus-dependent bouton size and GluR increases in the shv mutant (Figure 5), the interaction between neuronal and glial Shv remains unclear. Does neuronal Shv influence the expression or distribution of glial Shv?

      We agree that investigating whether neuronal and glial shv pools influence each other’s expression or distribution is an important future direction. We hope to investigate this in more detail in the future using LexA-LexOp and GAL4/UAS dual expression systems.

      Issues Regarding the Regulation of GluR and Perisynaptic Glutamate by Glial Shv:

      (7) The methodology for iGluSnFR measurement (Figure 6A) is inadequately described. If anti-HRP staining was used to normalize signals, it suggests the experiment may have involved fixed tissue. However, iGluSnFR typically measures glutamate levels in live cells, raising concerns about the validity of this approach in fixed samples.

      We apologize for not being clear about the method used to measure iGluSnFR. The original figure was generated from imaging iGluSnFR signals immediately following fixation. To address the reviewer’s concern and validate these results, we have now performed live imaging experiments using a water dipping objective to measure iGluSnFR intensity in unfixed preparations (new Figure 6A). To label synaptic boutons, we co-expressed mtdTomato using the neuronal driver, nSybGAL4. The results from the live imaging experiments confirmed our original observations that glial Shv required to control ambient extracellular glutamate levels (see updated Fig. 6A and text). Additionally, to ascertain that the decrease in iGluSnFR signal reflects a decrease in ambient extracellular glutamate levels rather than glutamate depletion caused by high levels of GluR, we upregulated GluR levels using mhc-GluRIIA, which drives GluRIIA expression in muscles (Petersen et al., 1997). We found mhc-GluRIIA animals exhibited elevated levels of not only GluRIIA but also the obligatory GluRIIC subunit. However, iGluSnFR signals at the synapse remained unchanged (Figure 6A), suggesting that elevated GluR density alone does not reduce signals. Taken together, these results suggest that glial Shv plays a critical role in controlling ambient extracellular glutamate levels. 

      (8) As shown in Figure 2, repo>shv-RNAi increases GluR levels before high K+ stimulation, potentially saturating postsynaptic GluR expression and precluding further increases upon stimulation.

      Our data in Figure 6 show that GluR levels can increase up to four-fold upon stimulation in the presence of glutamate, whereas repo>shv-RNAi results in only a ~2-fold increase in baseline GluR concentration. These results suggest that the synapse retains the capacity for further upregulation. Thus, we do not think that the lack of GluR enhancement in repo>shv-RNAi is due to a saturated postsynaptic state, but rather reflects a requirement for glial Shv in activity-dependent modulation.

      (9) Despite glial shv knockdown lowering extracellular glutamate levels, GluR levels unexpectedly increase (Figure 6B). This contradicts the known requirement for high ambient glutamate concentrations to promote GluR clustering and membrane expression (Chen et al., 2009). Furthermore, adding 2 mM glutamate reverses these increases, suggesting additional complexity in the regulation of Shv synaptic remodeling.

      Thank you for the comment and the opportunity to clarify this point. While it may seem counterintuitive at first glance, our observations are in line with previous reports that showed low ambient glutamate levels significantly elevated GluR intensity at the Drosophila NMJ (Chen et al., 2009), but such increase can be reversed by glutamate supplementation (Augustin et al., 2007; Chen et al., 2009). We have revised the text to more clearly reflect this connection.

      (10) If glial Shv promotes GluR expression, why does the increased extracellular Shv from neuronal shv knockdown (elav>shv-RNAi, Figure 5E) fail to elicit stimulus-dependent GluR elevation?

      We speculate that this is because glial Shv does not activate integrin signaling (Figure 5B, C), and elevated glial Shv increases ambient glutamate concentration (Figure 6A), thereby reducing GluR expression (Augustin et al., 2007; Chen et al., 2009). This is indeed what we observed when shv is knocked down in neurons. 

      Additional Issues:

      (11) The type of bouton used for quantification (e.g., Ib or Is boutons) is not specified, which is critical for interpreting the results.

      We apologize for not being clear. We analyzed type Ib boutons as done previously (Lee et al., 2017 and Chang et al., 2024), and have now included this information in the Methods section.  

      (12) The extent of Shv protein depletion in the repo-GeneSwitch system needs validation to confirm the efficacy of the knockdown.

      Thank you for the suggestion. We confirmed the efficiency of acute shv knockdown by the repo-GeneSwitch system by performing Western blot analysis of dissected larval brains (Figure 2 – figure supplement 1B). Acute glial knockdown using the repo-GeneSwitch driver resulted in a 30% reduction in Shv levels, similar to the decrease observed with the repo-GAL4 driver, suggesting that the GeneSwitch driver is functional. Furthermore, knockdown of shv by the ubiquitous tubulin-GAL4 driver completely eliminated Shv protein, indicating that the RNAi construct is effective.  

      Reviewer #2 (Recommendations for the authors):

      (1) General comment on statistics/data presentation: The authors employ an unusual method of using both one-way ANOVA and multiple t-test stats for the same data. Would a 2-way ANOVA be the more appropriate solution to this problem (to analyze across genotype and stimulation condition)? Also a chart in the supplementals showing all comparisons rather than just the fraction explicitly reported in the graphs would be helpful (it is not clear if no indication on significance indicates no difference or just not reported between some of the baseline levels, especially since everything is presented as ratios and in some cases this could help with data interpretation of which baseline levels are different and how they compare to other baselines and other post-stim levels). Further, there are no sample sizes given for any experiment, nor are any values of means, SD, etc ever explicitly given.

      We appreciate the thoughtful suggestion. While a two-way ANOVA could be used to examine interaction effects between genotype and stimulation condition, our analysis was designed to address a specific biological question: whether each genotype, independent of baseline levels, is capable of undergoing activitydependent synaptic remodeling. To this end, we used t-tests to directly compare unstimulated vs. stimulated conditions within each genotype, allowing us to determine whether stimulation produces a significant effect in an all-or-none manner. In parallel, we applied one-way ANOVA with post hoc tests to analyze differences among baseline (unstimulated) conditions across genotypes. This approach is justified by the fact that stimulation was applied acutely and separately, and therefore the baseline values should not be influenced by the stimulated condition. Because we were not aiming to compare the extent of synaptic remodeling between genotypes, we did not use a two-way ANOVA to analyze interaction effects across all conditions.

      In response to the reviewer’s suggestion, we have now added the sample number in the graphs. Additionally, in the Methods section, we include information that each sample represents biological repeats, and that data are presented as fold-change relative to unstimulated controls from the same experimental batch. This normalization is necessary, as absolute GluR intensities can vary depending on microscope settings and staining conditions.

      (2) To clarify distinct roles of Shv coming from neurons vs glia it would help if the authors could include more data on the rescue of shv mutants with UAS-Shv in neurons alone. This data is never shown in the manuscript and data on what effect this rescue has on the pertinent phenotypes in this paper (bouton size and GluR staining) is not reported in the referred to 2017 paper. What this does and does not do for these phenotypes has important implications for how to interpret the glia-only rescue findings.

      Thank you for the suggestion. We have now included new data on neuronal Shv rescue in shv<sup>1</sup> mutants as suggested (updated Figure 4A). Consistent with previous findings that neuronal Shv rescues integrin signaling and electrophysiological phenotypes (Lee et al., 2017), we found that it also restores bouton size, GluR levels, and activity-induced synaptic remodeling. These results support the functional contribution of neuronal Shv. 

      (3) Figure 1C: Where are the images in the periphery taken? The morphology of the glia is odd in that "blobs" of glial membrane seemingly unattached to anything else are floating about? Perhaps these are a thin stack projection and so the connection to the main glia "stalks" are just cut off? Could a specific individual synapse be shown? Also consider HRP shown on its own so that where the actual boutons are could be more clear. It seems like both the Tomato and HRP channels are really overexposed making visualizing the morphology quite confusing. Also why not use the antibody against Shv to directly visualize expression which is more direct than a knock-in tagged version?

      Figure 1C shows a single optical slice of the NMJ at muscle segment 2, selected to clearly highlight Shv-eGFP localization at a branch in close contact with the glial membrane. The glial stalk is not visible in this image because it lies in a different focal plane from the branch of interest. We have now specified this information in the figure legend. In the original figure, the HRP signal (405 channel) was oversaturated, which interfered with visual clarity. In the updated Figure 1C, we reduced the intensity of overexposed channels to better reveal the weak ShveGFP signal and fine glial processes. While we have generated an antibody against Shv, the amount is extremely limited, and hence the Shv-eGFP fusion serves as a valuable tool for visualizing subcellular localization.

      (4) Do glutamate levels really rise in glia Shv KD? Although iGluSnFR signal changes could it be the high level of GluR at the synapse acting as sponges to sequester glutamate so that it can't stimulate the sensor as well? One way to test this would be to overexpress or KD GluRs in muscle in wildtype (or in the repo>Shv RNAi background) to see if that alone can modulate iGluSnfR signals?

      Thank you for suggesting this important control. To address the question of whether high level GluR density alone could influence neuronal iGluSnFR sensor readouts, we expressed GluR using a mhc promoter-driven GluRIIA fusion line, which increases total GluRIIA expression in muscle independently of the Gal4/UAS system. As shown in Figure 6 – figure supplement 1, mhc-GluRIIA animals exhibited elevated levels of not only GluRIIA but also the obligatory GluRIIC subunit. Despite this increase in GluR expression, we did not observe any change in extracellular glutamate levels, as measured by live imaging using the neuronal iGluSnFR sensor (updated Figure 6A). These results suggest that elevated GluR density alone does not alter iGluSnFR sensors  dynamics and further support our conclusions.

      (5) The authors have some Shv constructs that can't be secreted or can't bind to integrins. Performing cell type specific rescues with these constructs might also help distinguish how source matters for each proposed sub-function of Shv though this may be outside the scope of this study. 

      Thank you for noticing the Shv constructs we have. We hope to further test subfunctions of Shv in the future.

      (6) At one point the authors discuss experiments that measure how much Shv is released by glia during neuronal stimulation. Then state that "These data indicate that glial Shv does not directly inhibit integrin signaling." But how this experiment relates to integrin signaling is not explained and unclear.

      We apologize for the confusion. We have now updated the text to better explain our logic: “This activity-induced decrease in glial Shv levels, along with reduced integrin activation (Fig. 5B), suggest that glial Shv does not act by directly inhibiting integrin signaling.”

      Reviewer #3 (Recommendations for the authors):

      Minor comments

      (1) Readers are left wondering what causes the increased baseline of GluR after glial shv RNAi at Fig 1, which is addressed much later. It would be helpful to preemptively mention this.

      Thank you for the suggestion. To maintain a logical flow, we chose to first present the phenotypic data in Figures 1 and 2 and then return to the mechanistic explanation once we introduced ambient glutamate measurements. 

      (2) Be consistent with eGFP vs EGFP.

      Thank you, we have corrected the inconsistencies.  

      (3) Scale bar for Fig 1B is missing in the low-magnification panel.

      Thank you for pointing out. We’ve put in the scale bar for Figure 1B.   

      (4) Fig 1C, it would be helpful to elaborate on the anatomy. For example, what NMJ/abdominal segment is this? Why only some axons are surrounded by glia?

      Figure 1C presents a single optical slice of the NMJ at muscle segment 2, chosen to highlight Shv-eGFP localization at a branch closely juxtaposed to the glial membrane. The glial stalk is not shown in this image because it resides in a different focal plane than the branch being visualized. We have now included this information in the figure legend.

      (5) For Fig 3B, while it is stated that "we observed normal synaptic remodeling using alrmGAL4," the effect size is smaller. There seems to be a decrease in the amount of synaptic remodeling occurring?

      Thank you for pointing this out. Our primary goal was to determine whether each genotype, regardless of baseline GluR levels, is capable of undergoing activitydependent synaptic remodeling in response to stimulation. For this reason, we focused on detecting the presence or absence of remodeling rather than comparing the extent of remodeling across genotypes. While a smaller effect on activity-induced bouton size was observed with alrm-GAL4, the change was still statistically significant, indicating that remodeling does occur in this genotype. Currently, we do not have a clear biological interpretation for differences in the magnitude of remodeling, and therefore chose not to emphasize cross-genotype comparisons.

    1. eLife Assessment

      This useful study describes a mechanism of microbial modulation of anti-tumor immunity, which is of considerable interest in the field. However, the experimental supports for the key mechanistic claim, the interaction between RadD and NKp46, are not robust. Multiple experimental inconsistencies, especially in vivo, weaken the conclusions, making the strength of evidence incomplete. Additional controls, direct binding assays, and clarification of in vivo mechanistic relevance would strengthen the work.

    2. Reviewer #1 (Public review):

      In this manuscript, Rishiq et al. investigate whether natural killer (NK) cells can interact with Fusobacterium nucleatum and identify the molecular mediators involved in this interaction. The authors propose that the bacterial adhesin RadD may bind to the activating NK cell receptor NKp46 (NCR1 in mice), leading to NK cell activation and tumor control. While the topic is of significant interest and the hypothesis intriguing, the manuscript lacks critical experimental evidence, contains several technical concerns, and requires substantial revisions.

      Major Concerns:

      (1) Lack of Direct Evidence for RadD-NKp46 Interaction

      The central claim that RadD interacts with NKp46 is not formally demonstrated. A direct binding assay (e.g., Biacore, ELISA, or pull-down with purified proteins) is essential to support this assertion. The absence of this fundamental experiment weakens the mechanistic conclusions of the study.

      (2) Figure 2: Binding Specificity and Bacterial Strains

      A CEACAM1-Ig control should be included in all binding experiments to distinguish between specific and non-specific Ig interactions. There is differential Ig binding between strains ATCC 23726 and 10953. The authors should quantify RadD expression in each strain to determine if the difference in binding is due to variation in RadD levels.

      (3) Figure 3: Flow Cytometry Inconsistencies and Missing Controls

      What do the FITC-negative, Ig-negative events represent? The authors should clarify whether these are background signals, bacterial aggregates, or debris.

      Panel B, CEACAM1-Ig binding appears markedly increased compared to WT bacteria. The reason for this enhancement should be discussed-does it reflect upregulation of the bacterial ligand or an artifact of overexpression? Fluorescence compensation should be carefully reviewed for the NKp46/NCR1-Ig binding assays to ensure that the signals are not due to spectral overlap or nonspecific binding. Importantly, binding experiments using the FadI/RadD double knockout strain are missing and should be included. This control is essential.

      In Panel E, the basis for calculating fold-change in MFI is unclear. Please indicate the reference condition to which the change is normalized.

      (4) Figure 4: Binding Inhibition and Receptor Sensitivity

      Panel A lacks representative FACS plots and is currently difficult to interpret. Differences in the sensitivity of human vs. mouse NKp46 to arginine inhibition should be discussed, given species differences in receptor-ligand interactions. What are the inhibition results using F. nucleatum strains deficient in FadI?

      In Panel B, CEACAM1-Ig and RadD-deficient bacteria must be included as negative controls for binding specificity upon anti-NKp46 blocking.

      (5) Figure 5: Functional NK Activation and Tumor Killing

      In Panels B and C, the key control condition (NK cells + anti-NKp46, without bacteria) is missing. This is needed to evaluate if NKp46 recognition is involved in tumor killing. The authors should explicitly test whether pre-incubation of NK cells with bacteria enhances their anti-tumor activity. Alternatively, could bacteria induce stress signals in tumor cells that sensitize them to NK killing? This distinction is critical.

      (6) Figure 5D: Mechanism of Peripheral Activation

      It is suggested that contact between bacteria and NK cells in the periphery leads to their activation. Can the authors confirm whether this pre-activation leads to enhanced killing of tumor targets, or if bacteria-tumor co-localization is required? The literature indicates that F. nucleatum localizes intracellularly within tumor cells. If so, how is RadD accessible to NKp46 on infiltrating NK cells?

      (8) Figure 5E and In Vivo Relevance

      Surprisingly, F. nucleatum infection is associated with increased tumor burden. Does this reflect an immunosuppressive effect? Are NK cells inhibited or exhausted in infected mice (TGIT, SIGLEC7...)? If NK cell activation leads to reduced tumor control in the infected context, the role of RadD-induced activation needs further explanation. RadD-deficient bacteria, which do not activate NK cells, result in even poorer tumor control. This paradox needs to be addressed: how can NK activation impair tumor control while its absence also reduces tumor control?

      (9) NKp46-Deficient Mice: Inconsistencies

      In Ncr1⁻/⁻ mice, infection with WT or RadD-deficient F. nucleatum has no impact on tumor burden. This suggests that NKp46 is dispensable in this context and casts doubt on the physiological relevance of the proposed mechanism. This contradiction should be discussed more thoroughly.

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, Rishiq et al. investigated whether the RadD protein expressed by Fusobacterium nucleatum subsp. Nucleatum serves as a natural ligand for the NK-activating receptor NKp46, and whether RadD-NKp46 interaction enhances NK cell cytotoxicity against tumor cells. To address this, the authors first performed an association analysis of F. nucleatum abundance and NKp46 expression in head and neck squamous cell carcinoma (HNSC) and colorectal cancer (CRC) using the TCMA and TCGA databases, respectively. While a positive association between NKp46⁺ and F. nucleatum⁺ status with improved overall survival was observed in HNSC patients, no such correlation was found in CRC.

      Next, they examined the binding of NKp46-Ig to various F. nucleatum strains. To confirm that this interaction was mediated specifically by RadD, they employed a RadD-deficient mutant strain. Finally, to establish the functional relevance of the RadD-NKp46 interaction in promoting NK cell cytotoxicity and anti-tumor responses, they utilized a syngeneic mouse breast cancer model. In this setup, AT3 cells were orthotopically implanted into the mammary fat pad of C57BL/6 wild-type (WT) or Ncr1-deficient (NCR1⁻/⁻; murine orthologue of human NKp46) mice, followed by intravenous inoculation with either WT F. nucleatum or the ∆RadD mutant strain.

      Strengths:

      A notable strength of the work is that it identifies a previously unrecognized activating interaction between F. nucleatum RadD and the NK cell receptor NKp46, demonstrating that the same bacterial protein can engage distinct NK cell receptors (activating or inhibitory) to exert context-dependent effects on anti-tumor immunity. This dual-receptor insight adds depth to our understanding of F. nucleatum-immune interactions and highlights the complexity of microbial modulation of the tumor microenvironment.

      Weaknesses:

      (1) A previous study by this group (PMID: 38952680) demonstrated that RadD of F. nucleatum binds to NK cells via Siglec-7, thereby diminishing their cytotoxic potential. They further proposed that the RadD-Siglec-7 interaction could act as an immune evasion mechanism exploited by tumor cells. In contrast, the present study reports that RadD of F. nucleatum can also bind to the activating receptor NKp46 on NK cells, thereby enhancing their cytotoxic function.

      While F. nucleatum-mediated tumor progression has been documented in breast and colon cancers, the current study proposes an NK-activating role for F. nucleatum in HNSC. However, it remains unclear whether tumor-infiltrating NK cells in HNSC exhibit differential expression of NKp46 compared to Siglec-7. Furthermore, heterogeneity within the NK cell compartment, particularly in the relative abundance of NKp46⁺ versus Siglec-7⁺ subsets, may differ substantially among breast, colon, and HNSC tumors. Such differences could have been readily investigated using publicly available single-cell datasets. A deeper understanding of this subset heterogeneity in NK cells would better explain why F. nucleatum is passively associated with a favorable prognosis in HNSC but correlates with poor outcomes in breast and colon cancers.

      (2) The in vivo tumor data (Figure 5D-F) appear to contradict the authors' claims. Specifically, Figure 5E suggests that WT mice engrafted with AT3 breast tumors and inoculated with WT F. nucleatum exhibited an even greater tumor burden compared to mice not inoculated with F. nucleatum, indicating a tumor-promoting effect. This finding conflicts with the interpretation presented in both the results and discussion sections.

      (3) Although the authors acknowledge that F. nucleatum may have tumor context-specific roles in regulating NK cell responses, it is unclear why they chose a breast cancer model in which F. nucleatum has been reported to promote tumor growth. A more appropriate choice would have been the well-established preclinical oral cancer model, such as the 4-nitroquinoline 1-oxide (4NQO)-induced oral cancer model in C57BL/6 mice, which would more directly relate to HNSC biology.

      (4) Since RadD of F. nucleatum can bind to both Siglec-7 and NKp46 on NK cells, exerting opposing functional effects, the expression profiles of both receptors on intratumoral NK cells should be evaluated. This would clarify the balance between activating and inhibitory signals in the tumor microenvironment and provide a more mechanistic explanation for the observed tumor context-dependent outcomes.

    4. Author response:

      Reviewer #1 (Public review):

      Major Concerns:

      (1) Lack of Direct Evidence for RadD-NKp46 Interaction

      The central claim that RadD interacts with NKp46 is not formally demonstrated. A direct binding assay (e.g., Biacore, ELISA, or pull-down with purified proteins) is essential to support this assertion. The absence of this fundamental experiment weakens the mechanistic conclusions of the study.

      The reviewer is correct. Direct assays are currently quite impossible because RadD is huge protein and it will take years to purify it. Instead, we used immunoprecipitation assays using NKp46-Ig (Author response images 1 and 2). Fusobacteria were lysed using RIPA buffer, and the lysates were centrifuged twice to separate the supernatant from the pellet (which contains the bacterial membranes). The resulting lysates were incubated overnight with 2.5 µg of purified NKp46 and protein G-beads. After thorough washing, the bound proteins were placed in sample buffer and heated at 95 °C for 8 minutes. The eluates were run on a 10% acrylamide gel and visualized by Coomassie blue staining. As can be seen the NKp46-Ig was able to precipitate protein band around 350Kd in both F. polymorphum ATCC10953 (Author response image 1) and in F. nucleatum ATCC23726 (Author response image 2).

      Author response image 1. NKp46 immunoprecipitation with Fusobacterium polymorphum (ATCC 10953) lysates. The resulting lysates of supernatant and pellet of Fusobacterium were immunoprecipitated (IP) with 2.5 μg of control fusion protein (RBD-Ig) or with NKp46-Ig. A 2.5 μg of purified fusion proteins were also run on gel.

      Author response image 2. NKp46 immunoprecipitation with Fusobacterium nucleatum (ATCC 23726) lysates. The resulting lysates of supernatant and pellet of Fusobacterium were immunoprecipitated (IP) with 2.5 μg of Control fusion protein (RBD-Ig) or with NKp46-Ig. 2.5 μg of purified fusion proteins were also run on gel.

      (2) Figure 2: Binding Specificity and Bacterial Strains

      A CEACAM1-Ig control should be included in all binding experiments to distinguish between specific and non-specific Ig interactions. There is differential Ig binding between strains ATCC 23726 and 10953. The authors should quantify RadD expression in each strain to determine if the difference in binding is due to variation in RadD levels.

      No significant difference in mCEACAM-1-Ig binding was observed across multiple independent experiments. Author response image 3 shows a representative histogram showing mCEACAM-1-Ig binding to F. nucleatum ATCC 23726 and F. polymorphum ATCC 10953. Comparable binding levels were detected in both bacterial species (upper histogram). Similarly, NKp46-Ig and Ncr1-Ig fusion proteins exhibited comparable binding patterns (lower histogram). It is currently not possible to quantify RadD expression directly, as no anti-RadD antibody is available.

      Author response image 3. CEACAM-1 Ig binding to Fusobacterium ATCC 23726 and ATCC 10953. Upper histograms show staining with secondary antibody alone (gray) compared to CEACAM-1 Ig (black line). Lower histograms show binding of NKp46 and Ncr1 fusion proteins to the two Fusobacterium strains. Gray represent secondary antibody controls.

      (3) Figure 3: Flow Cytometry Inconsistencies and Missing Controls

      What do the FITC-negative, Ig-negative events represent? The authors should clarify whether these are background signals, bacterial aggregates, or debris.

      We now present the gating strategy used in these experiments (Author response image 4). Fusion negative Ig samples were the bacterial samples stained only with the secondary antibody APC (anti-human AF647). The TITC-negative represent unlabeled bacteria.

      Author response image 4. Gating strategy for FITC-labeled Fusobacterium stained with fusion proteins. Bacteria were first gated as shown in the left panel. The gated population was then further analyzed in the right plot: the lower-left quadrant represents bacterial debris, the upper-left quadrant corresponds to FITC-stained bacteria only, and the upper-right quadrant shows bacteria double-positive for FITC and APC, indicating binding of the fusion proteins.

      Panel B, CEACAM1-Ig binding appears markedly increased compared to WT bacteria. The reason for this enhancement should be discussed-does it reflect upregulation of the bacterial ligand or an artifact of overexpression? Fluorescence compensation should be carefully reviewed for the NKp46/NCR1-Ig binding assays to ensure that the signals are not due to spectral overlap or nonspecific binding. Importantly, binding experiments using the FadI/RadD double knockout strain are missing and should be included. This control is essential.

      We don’t know why expression of CEACAM1-Ig binding is increased. Indeed, it will be nice to have the FadI/RadD double knockout strain which we currently don’t have.

      In Panel E, the basis for calculating fold-change in MFI is unclear. Please indicate the reference condition to which the change is normalized.

      The mean fluorescence intensity (MFI) fold change was calculated by dividing the MFI obtained from staining with the fusion proteins by the MFI of the corresponding secondary antibody control (bacteria incubated without fusion proteins).

      (4) Figure 4: Binding Inhibition and Receptor Sensitivity

      Panel A lacks representative FACS plots and is currently difficult to interpret.

      Fusobacteria binding to CEACAM-1, NKp46, and NCR1 fusion proteins was tested in the presence of 5 and 10 mM L-arginine (Author response image 5). L-arginine inhibited the binding of NKp46-Ig and NCR1-Ig, whereas no effect was observed on CEACAM-1-Ig binding.

      Author response image 5. Fusobacterium binding inhibition by L-Arginine. The figure shows the binding of CEACAM1-Ig (left panel), NKp46-Ig (middle panel), and Ncr1-Ig (right panel) in the presence of 0 mM (black), 5 mM (red), and 10 mM (blue) L-arginine.

      Differences in the sensitivity of human vs. mouse NKp46 to arginine inhibition should be discussed, given species differences in receptor-ligand interactions.

      Ncr1, the murine orthologue of human NKp46, shares approximately 58% sequence identity with its human counterpart (1). The observed differences in arginine-mediated inhibition of bacterial binding between mouse and human NKp46 might stem from structural differences or distinct posttranslational modifications, such as glycosylation. Indeed, prediction algorithms combined with high-performance liquid chromatography analysis revealed that Ncr1 possesses two putative novel O-glycosylation sites, of which only one is conserved in humans (2).

      References

      (1) Biassoni R., Pessino A., Bottino C., Pende D., Moretta L., Moretta A. The murine homologue of the human NKp46, a triggering receptor involved in the induction of natural cytotoxicity. Eur J Immunol. 1999 Mar; 29(3).

      (2) Glasner A., Roth Z., Varvak A., Miletic A., Isaacson B., Bar-On Y., Jonjić S., Khalaila I., Mandelboim O. Identification of putative novel O-glycosylations in the NK killer receptor Ncr1 essential for its activity. Cell Discov. 2015 Dec 22; 1:15036.

      What are the inhibition results using F. nucleatum strains deficient in FadI?

      The inhibition pattern observed in the F. nucleatum ΔFadI mutant was comparable to that of the wild-type strain (Author response image 6). When cultured under identical conditions and exposed to increasing concentrations of arginine (0, 5, and 10 mM), the F. nucleatum ΔFadI strain also demonstrated a dose-dependent reduction in binding to NKp46 and Ncr1.

      Author response image 6. Arginine inhibition of NKp46-Ig and Ncr1-Ig binding in F. nucleatum ΔFadI. Histograms show NKp46-Ig (A, C) and Ncr1-Ig (B, D) binding to F. nucleatum ATCC10953 ΔFadI (A and B) and to F. nucleatum ATCC23726 ΔFadI (A and B) following exposure to 5 mM and 10 mM L-Arginine. Panels (E) and (F) display the mean fluorescence intensity (MFI) quantification corresponding to (A and B) and (C and D), respectively.

      In Panel B, CEACAM1-Ig and RadD-deficient bacteria must be included as negative controls for binding specificity upon anti-NKp46 blocking.

      We appreciate the request to include CEACAM1-Ig and RadD-deficient bacteria as negative controls for specificity under anti-NKp46 blocking. We don’t not think it is necessary since the 02 antibody is specific for NKp46, we used other anti0NKp46 antibodies that did not block the interaction and an irrelevant antibofy, we showed that arginine produced a dose-dependent reduction in NKp46/Ncr1 binding, consistent with an arginine-inhibitable RadD interaction already shown in our manuscript (Fig. 4A). The ΔRadD strains we used already demonstrate loss of NKp46/Ncr1 binding and loss of NK-boosting activity (Figs. 3, 5). Collectively, these data establish that NKp46/Ncr1 recognition of a high-molecular-weight ligand consistent with RadD is specific and functionally relevant.

      Figure 5: Functional NK Activation and Tumor Killing

      In Panels B and C, the key control condition (NK cells + anti-NKp46, without bacteria) is missing. This is needed to evaluate if NKp46 recognition is involved in tumor killing. The authors should explicitly test whether pre-incubation of NK cells with bacteria enhances their anti-tumor activity.

      No significant difference in NK cell cytotoxicity was observed between untreated NK cells and NK cells incubated with anti-NKp46 antibody in the absence of bacteria. Therefore, the NK + anti-NKp46 (O2) group was included as an additional control alongside the other experimental conditions shown in Figures 5b and 5c, and is presented in Author response image 7 below.

      Author response image 7. NK cytotoxicity against breast cancer cell lines. NK cell cytotoxicity against T47D (left) and MCF7 (right) breast cancer cell lines. This experiment follows the format of Figure 5b and 5c, with the addition of the NK cells + O2 antibody group. No significant differences were observed when values were normalized to NK cells alone.

      Could bacteria induce stress signals in tumor cells that sensitize them to NK killing? This distinction is critical.

      It remains unclear whether the bacteria induce stress-related signals in tumor cells that render them more susceptible to NK cell–mediated cytotoxicity.

      (6) Figure 5D: Mechanism of Peripheral Activation

      It is suggested that contact between bacteria and NK cells in the periphery leads to their activation. Can the authors confirm whether this pre-activation leads to enhanced killing of tumor targets, or if bacteria-tumor co-localization is required? The literature indicates that F. nucleatum localizes intracellularly within tumor cells. If so, how is RadD accessible to NKp46 on infiltrating NK cells?

      We do not expect that pre-activation of NK cells with bacteria would enhance their tumor-killing capacity. In fact, when NK cells were co-incubated with bacteria, we occasionally observed NK cell death. Although F. nucleatum can reside intracellularly, bacterial entry requires prior adhesion to tumor cells. At this stage—before internalization—the bacteria are accessible for recognition and binding by NK cells.

      (8) Figure 5E and In Vivo Relevance

      Surprisingly, F. nucleatum infection is associated with increased tumor burden. Does this reflect an immunosuppressive effect? Are NK cells inhibited or exhausted in infected mice (TGIT, SIGLEC7...)? If NK cell activation leads to reduced tumor control in the infected context, the role of RadD-induced activation needs further explanation. RadD-deficient bacteria, which do not activate NK cells, result in even poorer tumor control. This paradox needs to be addressed: how can NK activation impair tumor control while its absence also reduces tumor control?

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. The increased tumor burden observed in infected mice may therefore result from bacterial interference with immune cell infiltration and accumulation within the tumor microenvironment (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun 11, 3259 (2020)). Consequently, the NK cells that do reach the tumor site can recognize and kill F. nucleatum–bearing tumor cells through RadD–NKp46 interactions. In the absence of RadD, this recognition is impaired, leading to reduced NK-mediated cytotoxicity and increased tumor growth.

      (9) NKp46-Deficient Mice: Inconsistencies

      In Ncr1⁻/⁻ mice, infection with WT or RadD-deficient F. nucleatum has no impact on tumor burden. This suggests that NKp46 is dispensable in this context and casts doubt on the physiological relevance of the proposed mechanism. This contradiction should be discussed more thoroughly.

      Ncr1 is also directly involved in mediating NK cell–dependent killing of tumor cells, even in the absence of bacterial infection. Therefore, in Ncr1-deficient mice, F. nucleatum has no additional effect on tumor progression (Glasner, A., Ghadially, H., Gur, C., Stanietsky, N., Tsukerman, P., Enk, J., Mandelboim, O. Recognition and prevention of tumor metastasis by the NK receptor NKp46/NCR1. J Immunol. 2012).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A previous study by this group (PMID: 38952680) demonstrated that RadD of F. nucleatum binds to NK cells via Siglec-7, thereby diminishing their cytotoxic potential. They further proposed that the RadD-Siglec-7 interaction could act as an immune evasion mechanism exploited by tumor cells. In contrast, the present study reports that RadD of F. nucleatum can also bind to the activating receptor NKp46 on NK cells, thereby enhancing their cytotoxic function.

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. In contrast, NKp46 and its murine homologue, Ncr1, both recognize and bind the bacterium.

      While F. nucleatum-mediated tumor progression has been documented in breast and colon cancers, the current study proposes an NK-activating role for F. nucleatum in HNSC. However, it remains unclear whether tumor-infiltrating NK cells in HNSC exhibit differential expression of NKp46 compared to Siglec-7. Furthermore, heterogeneity within the NK cell compartment, particularly in the relative abundance of NKp46⁺ versus Siglec-7⁺ subsets, may differ substantially among breast, colon, and HNSC tumors. Such differences could have been readily investigated using publicly available single-cell datasets. A deeper understanding of this subset heterogeneity in NK cells would better explain why F. nucleatum is passively associated with a favorable prognosis in HNSC but correlates with poor outcomes in breast and colon cancers.

      Currently, there are no publicly available single-cell datasets suitable for characterizing NK cell heterogeneity in the context of F. nucleatum infection—particularly regarding the expression of Siglec-7, NKp46, or CEACAM1 and their potential association with poor clinical outcomes in breast, head and neck squamous cell carcinoma (HNSC), or colorectal cancer (CRC). Furthermore, no RNA-seq datasets are available for breast cancer cases specifically associated with F. nucleatum infection and poor prognosis. Therefore, we analyzed bulk RNA expression datasets for Siglec-7 and CEACAM1 and evaluated their associations with HNSC and CRC using the same patient databases utilized in our manuscript (Author response image 8). No significant differences in Siglec-7 expression were detected between HNSC and CRC samples (Author response image 8A). Although CEACAM1 mRNA levels did not differ between F. nucleatum–positive and –negative cases within either cancer type, its overall expression was higher in CRC compared to HNSC (Author response image 8B).

      Author response image 8. Siglec7 and Ceacam1 expression and the prognostic effect of F. nucleatum in a tumor-type-specific manner. Comparison of Siglec7 (A) and Ceacam1 (B) expression across HNSC and CRC tumors. Log₂ expression levels of NKp46 mRNA were compared across HNSC and CRC cohorts, stratified by F. nucleatum positive and negative. Results were analyzed by one-way ANOVA with Bonferroni post hoc correction.

      (2) The in vivo tumor data (Figure 5D-F) appear to contradict the authors' claims. Specifically, Figure 5E suggests that WT mice engrafted with AT3 breast tumors and inoculated with WT F. nucleatum exhibited an even greater tumor burden compared to mice not inoculated with F. nucleatum, indicating a tumor-promoting effect. This finding conflicts with the interpretation presented in both the results and discussion sections.

      Siglec-7 lacks a direct orthologue in mice, and neither mouse TIGIT nor CEACAM1 bind F. nucleatum. The increased tumor burden observed in infected mice may therefore result from bacterial interference with immune cell infiltration and accumulation within the tumor microenvironment (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun 11, 3259 (2020)). Consequently, the NK cells that do reach the tumor site can recognize and kill F. nucleatum–bearing tumor cells through RadD–NKp46 interactions. In the absence of RadD, this recognition is impaired, leading to reduced NK-mediated cytotoxicity and increased tumor growth.

      (3) Although the authors acknowledge that F. nucleatum may have tumor context-specific roles in regulating NK cell responses, it is unclear why they chose a breast cancer model in which F. nucleatum has been reported to promote tumor growth. A more appropriate choice would have been the well-established preclinical oral cancer model, such as the 4-nitroquinoline 1-oxide (4NQO)-induced oral cancer model in C57BL/6 mice, which would more directly relate to HNSC biology.

      The tumor model we employed is, to date, the only model in which F. nucleatum has been shown to exert a measurable effect, which is why we selected it for our study (Parhi, L., Alon-Maimon, T., Sol, A. et al. Breast cancer colonization by Fusobacterium nucleatum accelerates tumor growth and metastatic progression. Nat Commun. 2020; 11: 3259). We have not tested the 4-nitroquinoline-1-oxide (4NQO)–induced oral cancer model, and we are uncertain whether its use would be ethically justified.

      (4) Since RadD of F. nucleatum can bind to both Siglec-7 and NKp46 on NK cells, exerting opposing functional effects, the expression profiles of both receptors on intratumoral NK cells should be evaluated. This would clarify the balance between activating and inhibitory signals in the tumor microenvironment and provide a more mechanistic explanation for the observed tumor context-dependent outcomes.

      This question was answered in Author response image 8 above.

    1. eLife Assessment

      This work is an important contribution to understanding the role of FGF signaling in the induction of primitive-like cells in a 2D system of human gastrulation. The authors provide compelling evidence showing that endogenous FGF ligands, acting through FGF receptors localized basolaterally, are determinant in the acquisition of specific cell fates. These observations will be of broad relevance to the FGF field.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study on the role of FGF signaling in the induction of primitive streak like-cells (PS-LC) in human 2D-gastruloids. The authors use a previously characterized standard culture that generates a ring of PS-LCs (TBXT+) and correlate this with pERK staining. A requirement for FGF signaling in TBXT induction is demonstrated via pharmacological inhibition of MEK and FGFR activity. A second set of culture conditions (with no exogenous FGFs) suggests that endogenous FGFs are required for pERK and TBXT induction. The authors then characterize, via scRNA-seq, various components of the FGF pathway (genes for ligand, receptors, ERK regulators, HSPG regulation). They go on to characterize the pFGFR1, receptor isoforms and polarized localization of this receptor. Finally, they perform FGF4 inhibition and use a cell line with a limited FGF17 inactivation (heterozygous null) and show that loss of these FGFs reduce PS-LC and derivative cell types.

      Strengths:

      (1) As the authors point out, the role of FGF signaling in gastrulation is less well understood than other signaling pathways. Hence this is a valuable contribution to that field.

      (2) The FGF4 and FGF17 loss-of-function experiments in Figure 5 are very intriguing. This is especially so given the intriguing observation that these FGFs appear to be dominating in this model of human gastrulation, in contrast to what FGFs dominate in mice, chick and frogs.

      (3) In general this paper is valuable as a further development of the Human gastruloid system and the role of FGF signaling in the induction of PS-CLs. The wide net that the authors cast in characterizing FGF ligand gene, receptor isoforms, and downstream components provides a foundation for future work. As the authors write near the beginning of the Discussion "Many questions remain."

      Weaknesses:

      (1) FGFs are cell survival factors in various aspects of development. The authors fail to address cell death due to loss of FGF signaling in any of their experiments. For example, in Figure 1E (which requires statistical analysis) and 1G (the bottom FGFRi row), there appears to be a significant amount of cell loss. Is this due to cell death? The authors should address the question of whether the role of FGF/ERK signaling is to keep the cells alive.

      (2) Regarding the sparse cells in 1G, is there a reduction in cell number only with FGFRi and not MEKi? Is this reproducible? Gattiglio et al (Development, 2023, PMID: 37530863) present data supporting a "community effect" in the FGF-induced mesoderm differentiation of mouse embryonic stem cells. Could a community effect be at play in this human system (especially given the images in the bottom row of 1G). If the authors don't address this experimentally they should at least address the ideas in Gattoglio et al.

      (3) Do the FGF4 and FGF17 LOF experiments in Figure 5 affect cell number like FGFRi in Figure 1? Why examine PS-LC induction only in FGF17 heterozygous cells and not homozygous FGF17 nulls?

      (4) The idea that FGF8 plays a dominant role during gastrulation of other species but not humans is so intriguing it warrants deeper testing. The authors dismiss FGF8 because its mRNA "...levels always remained low." (line 363) as well as the data published in Zhai et al (PMID: 36517595) and Tyser et al (PMID: 34789876). But there are cases in mouse development where a gene was expressed at levels so low, it might be dismissed, and yet LOF experiments revealed it played a role or even was required in a developmental process. The authors should consider FGF8 inhibition or inactivation to explore its potential role, despite its low levels of expression.

      (5) Redundancy is a common feature in FGF genetics. What is the effect of inhibiting FGF4 in FGF17 LOF cells?

      (6) I suggest stating that the authors take more caution describing FGF gradients. For example, in one Results heading they write "Endogenous FGF4 and FGF17 gradients underly the ERK activity pattern.", implying an FGF protein gradient. However, they only present data for FGF mRNA , not protein. This issue would be clarified if they used proper nomenclature for gene, mRNA (italics) and protein (no italics) throughout the paper.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The role of FGFs in embryonic development and stem cell differentiation has remained unclear due to its complexity. In this study, the authors utilized a 2D human stem cell-based gastrulation model to investigate the functions of FGFs. They discovered that FGF-dependent ERK activity is closely linked to the emergence of primitive streak cells. Importantly, this 2D model effectively illustrates the spatial distribution of key signaling effectors and receptors by correlating these markers with cell fate markers, such as T and ISL1. Through inhibition and loss-of-function studies, they further corroborated the needs of FGF ligands. Their data shows that FGFR1 is the primary receptor, and FGF2/4/17 are the key ligands for primitive streak development, which aligns with observations in primate embryos. Additional experiments revealed that the reduction of FGF4 and FGF17 decreases ERK activity.

      Strengths:

      This study provides comprehensive data and improves our understanding of the role of FGF signaling in primate primitive streak formation. The authors provide new insights related to the spatial localization of the key components of FGF signaling and attempt to reveal the temporal dynamics of the signal propagation and cell fate decision, which has been challenging.

    4. Reviewer #3 (Public review):

      Jo and colleagues set out to investigate the origins and functions of localized FGF/ERK signaling for the differentiation and spatial patterning of primitive streak fates of human embryonic stem cells in a well-established micropattern system. They demonstrate that endogenous FGF signaling is required for ERK activation in a ring-domain in the micropatterns, and that this localized signaling is directly required for differentiation and spatial patterning of specific cell types. Through high-resolution microscopy and transwell assays, they show that cells receive FGF signals through basally localized receptors. Finally, the authors find that there is a requirement for exogenous FGF2 to initiate primitive streak-like differentiation, but endogenous FGFs, especially FGF4 and FGF17, fully take over at later stages.

      Even though some of the authors' findings - such as the localized expression of FGF ligands during gastrulation and the importance of FGF/ERK signaling for cell differentiation in the primitive streak - have been reported in model organisms before, this is one of the first studies to investigate the role of FGF signaling during primitive streak-like differentiation of human cells. In doing so, the paper reports a number of interesting and valuable observations, namely the basal localization of FGF receptors which mirrors that of BMP and Nodal receptors, as well as the existence of a positive feedback loop centered on FGF signaling that drives primitive-streak differentiation. In the revised version of their work, the authors have furthermore dissected the role of different FGFs through knockdown approaches. These experiments reveal discrete functions for different FGF genes in their system, as well as interesting differences between the role of specific FGFs in human compared to model systems.

      Comments on revisions:

      The authors have appropriately addressed all comments and suggestions from the previous round of review. The only textual change that I would still like to suggest is to write explicitly in the main text corresponding to Fig. 1 that the mTESR1 medium used for these initial experiments already contains FGF. This is something that is probably known to experts in the field, but not necessarily to a broader readership.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This is an interesting study on the role of FGF signaling in the induction of primitive streak-like cells (PS-LC) in human 2D-gastruloids. The authors use a previously characterized standard culture that generates a ring of PSLCs (TBXT+) and correlate this with pERK staining. A requirement for FGF signaling in TBXT induction is demonstrated via pharmacological inhibition of MEK and FGFR activity. A second set of culture conditions (with no exogenous FGFs) suggests that endogenous FGFs are required for pERK and TBXT induction. The authors then characterize, via scRNA-seq, various components of the FGF pathway (genes for ligands, receptors, ERK regulators, and HSPG regulation). They go on to characterize the pFGFR1, receptor isoforms, and polarized localization of this receptor. Finally, they perform FGF4 inhibition and use a cell line with a limited FGF17 inactivation (heterozygous null) and show that loss of these FGFs reduces PS-LC and derivative cell types. 

      Strengths: 

      (1) As the authors point out, the role of FGF signaling in gastrulation is less well understood than other signaling pathways. Hence this is a valuable contribution to that field. 

      (2) The FGF4 and FGF17 loss-of-function experiments in Figure 5 are very intriguing. This is especially so given the intriguing observation that these FGFs appear to be dominating in this model of human gastrulation, in contrast to what FGFs dominate in mice, chicks, and frogs. 

      (3) In general this paper is valuable as a further development of the Human gastruloid system and the role of FGF signaling in the induction of PS-CLs. The wide net that the authors cast in characterizing the FGF ligand gene, receptor isoforms, and downstream components provides a foundation for future work. As the authors write near the beginning of the Discussion "Many questions remain." 

      We thank the reviewer for these positive comments.

      Weaknesses: 

      (1) FGFs are cell survival factors in various aspects of development. The authors fail to address cell death due to loss of FGF signaling in their experiments. For example, in Figure 1E (which requires statistical analysis) and 1G (the bottom FGFRi row), there appears to be a significant amount of cell loss. Is this due to cell death? The authors should address the question of whether the role of FGF/ERK signaling is to keep the cells alive. 

      Indeed, FGF also strongly affects cell survival and it is an interesting question to what extent this depends on ERK. Our manuscript focuses instead on the role of FGF/ERK signaling in cell fate patterning. As mentioned in our discussion, figure 1de show that doxycycline induced pERK leads to more TBXT+ cells than the control without restoring cell number, suggesting the role of FGF in controlling cell number is independent of the requirement for FGF/ERK in PS-LC differrentiation. To further support this, we have added data showing low doses of MEKi are sufficient to inhibit differentiation without affecting cell number (Supp. Fig. 1i).

      To address the reviewers question regarding the cause of cell loss, we now stained for BrdU and cleaved Cas3 to assess proliferation and apoptosis in the presence and absence of MEK and FGFR inhibition (new Supp. Fig.

      1ef). This shows that the effect of these inhibitors on cell number is primarily due to a reduction in proliferation. We have also included statistical analysis in Fig.1e. 

      (2) Regarding the sparse cells in 1G, is there a reduction in cell number only with FGFRi and not MEKi? Is this reproducible? Gattiglio et al (Development, 2023, PMID: 37530863) present data supporting a "community effect" in the FGF-induced mesoderm differentiation of mouse embryonic stem cells. Could a community effect be at play in this human system (especially given the images in the bottom row of 1G)? If the authors don't address this experimentally they should at least address the ideas in Gattoglio et al. 

      Indeed, FGFRi reproducibly affects cell number more than MEKi, in line with the fact that pathways other than MAPK/ERK downstream of FGF (e.g. PI3K) play important roles in cell survival and growth. However, we think the lack of differentiation in MEKi and FGFRi in Fig.1g cannot be attributed to a loss of cells combined with a community effect. This is because without FGFRi or MEKi cells efficiently differentiate to primitive streak at much lower densities than those originally shown, consistent with the data we discuss in response to (1) arguing against a primarily indirect effect of FGF on PS-LC differentiation through cell density. In the context of directed differentiation (rather than 2D gastruloids), we have now shown in a controlled manner that the effect of MEKi and FGFRi does not depend on a community effect by repeating the experiment in Fig.1g while adjusting cell seeding densities to obtain similar final cell densities in all three conditions (new Fig.1g, new Supp Fig.1g). Furthermore we have included new data showing extremely sparse cells without MEKi or FGFRi still differentiate without problems (new Supp Fig 1h). We have also include Gattoglio et al in our revised discussion.

      (3) Do the FGF4 and FGF17 LOF experiments in Figure 5 affect cell numbers like FGFRi in Figure 1? 

      We did not observe major changes in cell number in the FGF4 and FGF17 loss of function experiments. This is in line with our observation that low levels of ERK signaling are sufficient to maintain proliferation (new Supp. Fig. 1i), and the fact that low levels of ERK signaling are maintained in the absence of FGF4 and FGF17 (Fig.5), likely by FGF2 (Fig. 2). In contrast, FGFRi treatment in Fig.1 leads to a nearly complete loss of FGF signaling (ERK and other pathways) that has a dramatic effect on cell number.

      Why examine PS-LC induction only in FGF17 heterozygous cells and not homozygous FGF17 nulls? 

      We were unable to obtain homozygous FGF17 nulls, it is not clear if there is a reason for this. In the absence of homozygous nulls, we have now further corroborated our findings with additional knockdown data (described in response to other comments below).

      (4) The idea that FGF8 plays a dominant role during gastrulation of other species but not humans is so intriguing it warrants deeper testing. The authors dismiss FGF8 because its mRNA "...levels always remained low." (line 363) as well as the data published in Zhai et al (PMID: 36517595) and Tyser et al (PMID: 34789876). But there are cases in mouse development where a gene was expressed at levels so low, that it might be dismissed, and yet LOF experiments revealed it played a role or even was required in a developmental process. The authors should consider FGF8 inhibition or inactivation to explore its potential role, despite its low levels of expression. 

      We thank the reviewer for this suggestion. We have now analyzed the role of FGF8 using FISH to visualize its expression and siRNA to understand its function (Fig.5d,f,h; Supp.Fig.5e,g,6e). We found that FGF8 expression is higher earlier in differentiation, preceding most expression of TBXT. Our scRNA-seq only analyzed samples at 42h so did not capture this. Furthermore, FGF8 expression localized inside the PS-like ring rather than coinciding with it like FGF4. Surprisingly, FGF8 knockdown led to an increase in primitive streak-like differentiation, suggesting it may counteract FGF4. The results are shown in the revised Fig. 5 and Supplemental Fig. 5. While this certainly merits further investigation, understanding the role of FGF8 in more detail is beyond the scope of the current work. 

      (5) Redundancy is a common feature in FGF genetics. What is the effect of inhibiting FGF4 in FGF17 LOF cells? 

      Further siRNA and shRNA experiments showed that FGF17 knockdown had a much smaller effect than FGF4 knockdown on expression of primitive streak markers (Fig.5i, Supp.Fig.6f-i) but that FGF17 knockdown did lead to a complete loss of the mesoderm marker TBX6 (Fig.5j, Supp.Fig.6j). A double knockdown of FGF4+FGF17 looked similar to FGF4 alone (Supp.Fig.6k). Thus, we now think the more likely scenario is that FGF17 is downstream of FGF4-dependent PS-differentiation and although this may have a positive feedback effect whereby this FGF17 can then enhance further PS-differentiation, which we previously interpreted as partial redundancy, the primary role of FGF17 may be later, in mesoderm differentiation.

      (6) I suggest stating that the authors take more caution in describing FGF gradients. For example, in one Results heading they write "Endogenous FGF4 and FGF17 gradients underly the ERK activity pattern.", implying an FGF protein gradient. However, they only present data for FGF mRNA , not protein. This issue would be clarified if they used proper nomenclature for gene, mRNA (italics), and protein (no italics) throughout the paper. 

      Thank you for the suggestion. We have edited the paper to more clearly distinguish protein and mRNA. We do think our data provide substantial indirect evidence for a protein gradient which is what the results heading is meant to convey. Receptor activation is high where ERK activity is high (Fig.3), and receptor activation is limited by ligands, since creating a scratch to let exogenous FGF reach the basal side of cells in the center leads to receptor activation (Fig.4). This strongly suggests ERK activity reflects an FGF protein gradient. 

      Reviewer #2 (Public review): 

      Summary: 

      The role of FGFs in embryonic development and stem cell differentiation has remained unclear due to its complexity. In this study, the authors utilized a 2D human stem cell-based gastrulation model to investigate the functions of FGFs. They discovered that FGF-dependent ERK activity is closely linked to the emergence of primitive streak cells. Importantly, this 2D model effectively illustrates the spatial distribution of key signaling effectors and receptors by correlating these markers with cell fate markers, such as T and ISL1. Through inhibition and loss-of-function studies, they further corroborated the needs of FGF ligands. Their data shows that FGFR1 is the primary receptor, and FGF2/4/17 are the key ligands for primitive streak development, which aligns with observations in primate embryos. Additional experiments revealed that the reduction of FGF4 and FGF17 decreases ERK activity. 

      Strengths: 

      This study provides comprehensive data and improves our understanding of the role of FGF signaling in primate

      primitive streak formation. The authors provide new insights related to the spatial localization of the key components of FGF signaling and attempt to reveal the temporal dynamics of the signal propagation and cell fate decision, which has been challenging. 

      Weaknesses: 

      Given the solid data, the work only partially clarifies the complex picture of FGF signaling, so details remain somewhat elusive. The findings lack a strong punchline, which may limit their broader impact. 

      We thank this reviewer for their valuable feedback and compliment on the solidity of our data. The punchline of our work is that FGF4 and FGF17-dependent ERK signaling plays a key role in differentiation of human PS-like cells and mesoderm, and that these are different FGFs than those thought to drive mouse gastrulation. A second key point is that like BMP and TGFβ signaling, FGF signaling is restricted to the basolateral sides of pluripotent stem cell colonies due to polarized receptor expression, which is crucial for understanding the response to exogenous ligands added to the cell medium. Indeed, many facets of FGF signaling remain to be investigated in the future, such as how FGF regulates and is regulated by other signals, which we will dedicate a different manuscript to. 

      Reviewer #3 (Public review): 

      Jo and colleagues set out to investigate the origins and functions of localized FGF/ERK signaling for the differentiation and spatial patterning of primitive streak fates of human embryonic stem cells in a well-established micropattern system. They demonstrate that endogenous FGF signaling is required for ERK activation in a ringdomain in the micropatterns, and that this localized signaling is directly required for differentiation and spatial patterning of specific cell types. Through high-resolution microscopy and transwell assays, they show that cells receive FGF signals through basally localized receptors. Finally, the authors find that there is a requirement for exogenous FGF2 to initiate primitive streak-like differentiation, but endogenous FGFs, especially FGF4 and FGF17, fully take over at later stages. 

      Even though some of the authors' findings - such as the localized expression of FGF ligands during gastrulation and the importance of FGF/ERK signaling for cell differentiation in the primitive streak - have been reported in model organisms before, this is one of the first studies to investigate the role of FGF signaling during primitive streak-like differentiation of human cells. In doing so, the paper reports a number of interesting and valuable observations, namely the basal localization of FGF receptors which mirrors that of BMP and Nodal receptors, as well as the existence of a positive feedback loop centered on FGF signaling that drives primitive-streak differentiation. The authors also perform a comparison of the role of different FGFs across species and try to assign specific functions to individual FGFs. In the absence of clean genetic loss-of-function cell lines, this part of the work remains less strong. 

      We thank the reviewer for emphasizing the value of our findings in a human model for gastrulation. We agree more loss-of-function experiments would provide further insight into the role of different FGFs. While we did not manage to create knockout cell lines, we have now performed both siRNA and shRNA knock-down of all FGF4, and FGF17 in two different hPSC lines, performed siRNA knockdown of FGF8, and also made a FGF4+FGF17 shRNA double knockdown cell lines to more completely test the functions of the individual FGFs (Fig.5, Supp.Fig.5,6). Our data suggest FGF17 may be downstream of FGF4 and primarily required for mesoderm differentiation while FGF8 appears to counteract FGF4. In doing this we have added a large amount of new data to the manuscript and we have removed the heterozygous knockout data in the first version of the manuscript which we felt added little to the new data. Further experiments are still needed to solidify our interpretation but those are beyond the scope of the current work.   

      Reviewer #1 (Recommendations for the authors): 

      (1) FGF2 is added to culture experiments (e.g. Figure 4), but the commercial source is not mentioned in Methods. For example, it could be added to "Supplementary Table 1: Cell signaling reagents." 

      We apologize for this oversight and have now added the information to Supplementary Table 1.

      (2) Line 117-118: "For example, by controlling the expression of Wnt or Nodal which are both required for PS-like differentiation". It is clear what the authors mean, but this is not a complete sentence. 

      We edited this for clarity, it now reads: “First, is FGF/ERK signaling required directly for PS-like differentiation, or does it act indirectly? These possibilities are not mutually exclusive. For example, FGF/ERK could be required directly but also act indirectly by controlling Wnt or Nodal expression, as both Wnt and Nodal signaling are required for PS-like differentiation.”

      (3) Line 246 "...found its spatial pattern to strongly resembles that of pERK..." either remove "to" or change "resembles" to "resemble" 

      Thank you for catching this. We removed “to”.

      (4) Lines 391- 393 seem to be missing a word in the last phrase: "...with FGF17 more important continued differentiation to mesoderm and endoderm." Maybe "during" after the word "important"? 

      Thank you for catching this, indeed the word “during” was missing and we have now added it.

      (5) Please define acronyms in Figure 3D (PS-LC was defined previously, but not others). 

      We apologize for the oversight, we have now defined the acronyms.

      (6) The three blue lines in Figure 5B (right) are hard to discern (and I'm not colorblind). I suggest also using a variety of dotted lines in a subset of these FGFs. 

      Thanks you for the suggestion. We have now given all the FGFs colors that are more clearly distinct and made the TBXT and TBX6 lines dashed.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The reviewer acknowledges that FGF signaling is complex, particularly when dynamics and its correlation with cell fates are considered. To improve the clarity of the findings, the authors are encouraged to provide an additional schematic figure that clearly delineates the main findings of this study.  

      Thank you for the suggestion. We have now added a summary figure (Fig.6) to our discussion, which we hope helps present our findings more clearly.

      (2) The data suggest that FGF signaling may function differently in mice compared to primates, and their stem cell model aligns more closely with the latter. While the authors discuss this in the contents only based on sequencing data, it would be valuable to conduct some experiments with mouse embryos to validate the key differences. 

      It is unclear to us which experiments the reviewer has in mind. There is ample data on FGF expression in the mouse literature, as are many knockout phenotypes. Furthermore, verifying loss of function phenotypes (e.g. FGF17 knockout) in mouse is beyond our expertise.

      (3) Heparan sulfate proteoglycan (HSPG) is mentioned as an important component of FGF signaling; however, the only data related to HSPG is single-cell sequencing results. The authors should consider performing immunostaining or other assays to validate HSPG expression and spatial distribution, similar to the approach they used for other signaling components. 

      Our scratch experiments in Fig. 4 strongly argue against HSPGs as being responsible for the spatial pattern of FGF receptor activation: after a scratch across the colony the response is strong all along the scratch as expected if presence of FGF (an FGF gradient) controls the level of activity. If HSPGs were limiting, FGF flowing in from the media show not be able to uniformly activate receptors around the scratch.

      In addtion, we have now included an immunostain for HS in a newly added Supp. Fig. 4 which does not explain the observed pattern of ERK signaling.

      (4) In the scratch experiment, particularly high PERK expression is observed at the edge of the scratch. The authors should provide an explanation for why this expression is significantly higher compared to the edges of the colony. Additionally, it would be interesting to investigate the fate of the cells with super high PERK expression.  

      We have now determined that adaptive response to FGF is the reason that the response around the scratch is initially much higher than in the ERK activity ring that overlaps with the primitive streak-like cells. We have added figures showing that although the intial response to FGF exposure after scratching is very high, the response around the scratch adapts to levels similar in those in the ERK ring over the course of 6 hours (Fig.4ij). 

      (5) For some of the key experiments, multiple cell lines should be used to ensure that the findings are reproducible and applicable across different human stem cell lines.

      We have now checked FISH stainings and knockdown phenotypes for different FGFs in two different cell lines: ESI17 (hESC, XX) and PGP1 (hiPSC, XY). These results are shown in Supplementary Figures 6. We found all results to be consistent.

      (6) Where applicable, the meaning of error bars needs to be more clearly presented, including details on the number of independent experiments or samples used. 

      Thank you for pointing this out. Where error bar definitions were missing we have now added them to the figure captions.

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors only analyze the ppERK ring in micropatterns of a single size. What was the motivation for the choice of this size? Can the authors how the ppERK ring is expected to depend on colony size? 

      Much smaller patterns lose the interior pluripotent regions while much larger patters have a much larger pluripotent region, which requires larger tilings to image without providing additional insight. The colony sizedependence of cell fate patterning was described in the paper that established the 2D gastruloids model (Warmflash Nat Methods 2014) and we later showed this due to a fixed length scale of the BMP and Nodal signaling gradients from the colony edge (Jo et al Elife 2022). We have now included data showing that the ERK patterns behaves similarly, with a fixed length scale of the pattern implying that in smaller colonies the ERK ring becomes a disc and the entire center of the colony has high ERK signaling (Supp Fig 1a).

      (2) The scRNAseq is somewhat confusing - why do the two datasets not overlap in the PHATE representation? This is unexpected, because the two samples have been treated similarly, and the authors have integrated their data to iron out possible batch effects. This discrepancy should be discussed. The authors should also specify from which reference exactly the first dataset comes from.  

      The two datasets do overlap nicely, the same fates are well mixed in the same place and the gene expresison profiles for the integrated data (e.g., Fig.2e) look smooth, so we believe the integration is good, but different cell fates are represented to different degrees. In particular, sample 2 shows much more mesoderm differentiation making the mesoderm branch mostly orange. Occassionally samples differentiate faster or slower than average which we see here, and these samples were collected far apart in time. We do not believe this affects our conclusions, if anything, we think performing the analysis on two samples that differ this much should make the conclusions more robust.  

      (3) If find it intriguing that exogenous FGF2 is important early on for primitive streak-like differentiation, although the authors show that it does not reach the center of the colony. The authors may want to discuss this conundrum. Does the FGF2 effect propagate from the outside to the inside, or does it act at an early stage when the cells have not yet formed a tight epithelium on the micropattern? 

      The cells in the experiment in Fig. 5a were given 24h to epithelialize, so we we do believe it acts from the edge. We believe this may be due to FGF2 modulating the early BMP response on the edge and are working on a manuscript that further explores this pathway crosstalk.

      (4) The authors' statement that FGF4 and FGF17 have partially redundant functions is not very strong, mainly because the study lacks a full FGF17 loss-of-function cell line. If the authors wanted to improve on this point, they could knock down FGF4 in the FGF17 heterozygous line, or produce a homozygous FGF17 KO line. If there are specific reasons why FGF17 homozygous lines cannot be produced, this could be interesting to discuss, too. Finally, I noticed that the methods list experiments with an FGF17 siRNA, but these are not shown in the manuscript. 

      We agree our evidence was previously not as strong as it could be. While there is no reason we know of why homozygous knockout lines cannot be produced, we failed to produce on. To strengthen our evidence we have therefore included substantial new knockdown data.  We have now performed both siRNA and shRNA knockdown of all FGF4, and FGF17 in two different hPSC lines, performed siRNA knockdown of FGF8, and also made a FGF4+FGF17 shRNA double knockdown cell lines to more completely test the functions of the individual FGFs (Fig.5, Supp.Fig.5,6). These experiments showed that FGF17 knockdown had a much smaller effect than FGF4 knockdown on expression of primitive streak markers (Fig.5i, Supp.Fig.6f-i) but that FGF17 knockdown did lead to a complete loss of the mesoderm marker TBX6 (Fig.5j, Supp.Fig.6j). A double knockdown of FGF4+FGF17 looked similar to FGF4 alone (Supp.Fig.6k). Thus, we now think the more likely scenario is that FGF17 is downstream of FGF4-dependent PS-differentiation and although this may have a positive feedback effect whereby this FGF17 can then enhance further PS-differentiation, which we previously interpreted as partial redundancy, the primary role of FGF17 may be later, in mesoderm differentiation. Furthermore, our new data suggests FGF8 may counteract FGF4 and limit PS-like differentiation. 

      Minor 

      (5) Line 63: Reference(s) appear to be missing. 

      This whole paragraph summarizes the results of the references given on line 55, we have now repeated the relevant references where the reviewer indicated.

      (6) Supplementary Figure 1a,b does not show ppERK, unlike stated in lines 102 - 104. 

      Indeed, the data described in lines 102-104 is shown in Fig.1a and we have removed the original Supplementary Figure 1ab since it did not provide relevant information.

      (7) Line 201: It is not clear whether this is a new sequencing dataset, or if existing datasets have been reanalyzed. 

      We agree our description was unclear. We have edited the text, which now explicitly states that our analysis is based on one dataset we collected previously and a replicate that was newly collected and deposited on GEO for this manuscript.

      (8) Figure 2f; Supplementary Figure 2b, c: The colors need to be explained in scale bars. How has this data been normalized to allow for comparison between very different sample types? 

      We have now added color bars indicating the scale for each of these figure panels. As the caption stated, the interspecies comparison was normalized within each species, so the highest FGF level for any FGF at any time within each species is normalized to one. We are thus comparing between species the relative expression of different FGFs within each species. Indeed there is no good way to compare absolute expression between species. For extra clarity we have expanded our description of the interspecies comparison analysis and normalization in the methods section.

      (9) Line 232: Where is the expression of SEF shown? 

      It is shown in Fig. 2i, under the official gene name IL17RD.

      (10) Supplementary Figure 4 seems to be missing. 

      Thank you for pointing this out. We have now added a supplementary Fig.4.

      (11) Line 437: Citation needed. 

      We have included citations now.

      (12) Line 439: A similar feedback loop has been proposed to operate during mesoderm differentiation in mouse ESC (pmid: 37530863 ). The authors may consider citing this work. 

      Thank you for the suggestion, we have now included this work in the discussion. The feedback loop proposed in that work involves FGF8, while we were trying to explain why FGF4 and not FGF8 appears to be conserved across species by invoking an FGF4 feedback loop. Thus, it becomes even harder to explain differences in FGF4 and FGF8 expression between human and mouse gastrulation.

      (13) Supplementary Figure 6 is not described in the main text. 

      We have removed the original Supplementary Figure 6 and corresponding heterozygous knockout data in the main figure which we felt added little to the extensive knockdown data we now present. We did create a new Supplementary Figure 6 showing additional knockdown data which is described in the main tekst.

      (14) Submission of sequencing data to GEO needs to be updated. 

      We have now made the GEO data public.

    1. eLife Assessment

      This fundamental study substantially advances our current understanding of mechanotransduction within endothelial cells. The evidence provided by the authors in the revised manuscript is compelling, which taken together, provides strong support for the authors' major findings. The work will be of broad interest to cell biologists and vascular biologists.

    2. Reviewer #1 (Public review):

      This manuscript puts forward the concept that there is a specific time window during which YAP/TAZ driven transcription provides feedback for optimal endothelial cell adhesion, cytoskeletal organization and migration. The study follows up on previous elegant findings from this group and others which established the importance of YAP/TAZ-mediated transcription for persistent endothelial cell migration. The data presented here extends the concept at two levels: first, the data may explain why there are differences between experimental setups where YAP/TAZ activity are inhibited for prolonged times (e.g. cultures of YAP knockdown cells), versus experiments in which the transient inhibition of YAP/TAZ and (global) transcription affects endothelial cell dynamics prior to their equilibrium.

      All experiments are convincing, clearly visualized and quantified.

      The strength of the paper is that it clearly indicates that there are temporal controlled feedback systems, which is important knowledge for understanding the mechanisms that drive endothelial collective cell behavior.

      A potential limitation of the in vivo experiments is that the inhibitors may include off-target effects as well. To solve this caveat in future research endeavours, which is beyond the scope of the current study, it would be interesting to study this process in knockout models, combined with optogenetics and transgenic zebrafish lines that visualize endothelial cell functional properties such as proliferation and migration.

    3. Reviewer #2 (Public review):

      Summary:

      Here the effect of overall transcription blockade, and then specifically depletion of YAP/TAZ transcription factors was tested on cytoskeletal responses, starting from a previous paper showing YAP/TAZ-mediated effects on the cytoskeleton and cell behaviors. Here, primary endothelial cells were assessed on substrates of different stiffness and parameters such as migration, cell spreading, and focal adhesion number/length were tested upon transcriptional manipulation. Zebrafish subjected to similar manipulations were also assessed during the phase of intersegmental vessel elongation. The conclusion was that there is a feedback loop of 4 hours that is important for the effects of mechanical changes to be translated into transcriptional changes that then permanently affect the cytoskeleton.

      The idea is intriguing and a previous paper contains data supporting the overall model. The fish washout data is quite interesting and supports the kinetics conclusions. New transcriptional profiling in this version supports that cytoskeletal genes are differentially regulated with YAP/TAZ manipulations.

      Major strengths:

      The combination of in vitro and in vivo assessment provides evidence for timing in physiologically relevant contexts, and rigorous quantification of outputs is provided. The idea of defining temporal aspects of the system is quite interesting. New RNA profiling supports the model.

      Weaknesses:

      Actinomycin D blocks most transcription so exposure for hours likely leads to secondary and tertiary effects and perhaps effects on viability.

      Comments on latest version:

      I read the author response to previous reviews, and it seems they agree with the weaknesses stated in the reviews but did not provide any text or data revisions.

    4. Reviewer #4 (Public review):

      Summary:

      Mason DE et al. have extended their previous study on continuous migration of cells regulated by a feedback loop that controls gene expression by YAP and TAZ. Time scale of the negative feedback loop is derived from the authors' adhesion-spreading-polarization-migration (ASPM) assay. Involvement of transcription-translation in the negative feedback loop is evidenced by the experiments using Actinomycin D. The time scale of mechanotransduction-dependent feedback demonstrated by cytoskeletal alteration in the actinomycin D-treated endothelial colony forming cells (ECFCs) and that shown in the ECFCs depleted of YAP/TAZ by siRNA. The authors examine the time scale when ECFCs are attached to MeHA matrics (soft, moderate, and stiff substrate) and show the conserved time scale among the conditions they use, although instantaneous migration, cell area, and circularity vary. Finally, they tried to confirm that the time scale of the feedback loop-dependent endothelial migration by the effect of washout of Actinomycin D (inhibition of gene transcription), Puromycin (translational inhibition), and Verteporfin (YAP/TAZ inhibitor) on ISV extension during sprouting angiogenesis. They conclude that endothelial motility required for vascular morphogenesis is regulated by a mechanotransduction-mediated feedback loop that is dependent on YAP/TAZ-dependent transcriptional regulation.

      Strengths:

      The authors conduct ASPM assay to find the time scale of feedback when ECFCs attach to three different matrics. They observe the common time scale of feedback. Thus, under very specific conditions they use, the reproducibility is validated by their ASPM assay. The feedback loop mediated by inhibition of gene expression by Actinomycin D is similar to that obtained from YAP/TAZ-depleted cells, suggesting the mechanotranduction might be involved in the feedback loop. The time scale representing infection point might be interesting when considering the continuous motility in cultured endothelial cells, although it might not account for the migration of endothelial cells that is controlled by a wide variety of extracellular cues. In vivo, stiffness of extracellular matrix is merely one of the cues.

      Weaknesses:

      ASPM assay is based on attachment-dependent phenomenon. The time scale, including the inflection point determined by ASPM experiments using cultured cells and the mechanotransduction-based theory, do not seem to fit in vivo ISV elongation. Although it is challenging to find the conserved theory of continuous cell motility of endothelial cells, the data is preliminary and does not support the authors' claim. There is no evidence that mechanotransduction solely determines the feedback loop during elongation of ISVs.

      Comments on revisions:

      The authors' methods using ASPM assay might suggest the feedback loop by their in vitro culture assay. They still need to confirm the loop in vivo using zebrafish intersegmental vessels. The time course of the feedback loop is supported by the ASPM assay. However, the feedback loop is not confirmed in vivo, although it might be suggested by the phenotypes of the ISV treated with drugs. Thus, in the abstract and in the results section, they had better rewrite the interpretation. They have not yet confirmed the feedback loop in vivo.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      All experiments are convincing, clearly visualized and quantified. 

      The strength of the paper is that it clearly indicates that there are temporal controlled feedback systems which is important for endothelial collective cell behavior. 

      A limitation of the study is that the inhibitory studies in vivo may include off-target effects as well. Future endeavors, including specific knockout models, optogenetics and/or transgenic zebrafish lines that visualize endothelial cell properties (proliferation and migration) will be informative to track individual endothelial cell responses upon feedback signals.

      We agree with the reviewer and are currently conducting experiments with optogenetic tools, knockout models, and transgenic zebrafish lines to dissect the feedback loop dynamics at the cellular scale.    

      Reviewer #2 (Public review):

      Major strengths: The combination of in vitro and in vivo assessment provides evidence for timing in physiologically relevant contexts, and rigorous quantification of outputs is provided. The idea of defining temporal aspects of the system is quite interesting. New RNA profiling supports the model. 

      Weaknesses: Actinomycin D blocks most transcription so exposure for hours likely leads to secondary and tertiary effects and perhaps effects on viability.

      We agree with the reviewer that “off-target” effects are a limitation of the pharmacologic approach. We have also previously shown that long-term treatment with actinomycin D reduces ECFC survival (Mason et al., 2019). 

      Reviewer #3 (Public review):

      Strengths: The authors conduct ASPM assay to find the time scale of feedback when ECFCs attach to three different matrics. They observe the common time scale of feedback. Thus, under very specific conditions they use, the reproducibility is validated by their ASPM assay. The feedback loop mediated by inhibition of gene expression by Actinomycin D is similar to that obtained from YAP/TAZ-depleted cells, suggesting the mechanotranduction might be involved in the feedback loop. The time scale representing infection point might be interesting when considering the continuous motility in cultured endothelial cells, although it might not account for the migration of endothelial cells that is controlled by a wide variety of extracellular cues. In vivo, stiffness of extracellular matrix is merely one of the cues. 

      Weaknesses: ASPM assay is based on attachment-dependent phenomenon. The time scale including the inflection point determined by ASPM experiments using cultured cells and the mechanotransduction-based theory do not seem to fit in vivo ISV elongation. Although it is challenging to find the conserved theory of continuous cell motility of endothelial cells, the data is preliminary and does not support the authors' claim. There is no evidence that mechanotransduction solely determines the feedback loop during elongation of ISVs. The points to be addressed are listed in recommendations for the authors.

      The ASPM assay enabled us to define temporal dynamics of YAP/TAZ mechanotransduction. We then used those insights to design ISV washout experiments that tested if the characteristic time scales were conserved in vivo. However, we agree with the limitations identified by the reviewer. Cells behave and respond to mechanical cues differently in 2D vs 3D environments, and the microenvironment in vivo is much more complex. Future work with optogenetic tools will be useful to dissect the temporal kinetics in vivo during ISV elongation.

    1. eLife Assessment

      These valuable studies explore the consequences of exposure to the toxin hydrogen sulfide (H2S) on the behavior and physiology of C. elegans. The work finds that behavioral changes evoked by H2S exposure are modulated by several regulatory pathways known to influence chemosensory-evoked locomotor behavior, but there is incomplete data to support the authors' claim of comprehensive mechanistic insight into the consequences of H2S exposure. Nevertheless, the findings may be informative for those studying organismal stress responses and the effects of mitochondrial ROS on behavior and physiology.

    2. Reviewer #3 (Public review):

      Summary:

      The manuscript explores behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H2S motivates the study of responses in C. elegans.

      The authors have followed up observations in the initial version of the manuscript, and their data do not support the direct sensing of H2S by the ASJ neurons or other sensory neurons. Genetic and parallel analysis of O2 and CO2 responsive pathways do not reveal further insights regarding potential mechanisms underlying H2S sensing. Gene expression analysis extends prior work. Finally, the authors have examined how H2S-evoked locomotory behavioral responses are affected in mutants with altered stress and detoxification response to H2S, most notably hif-1 and egl-9. These data, while examining locomotion, are more suggestive that observed effects on animal locomotion are secondary to altered organismal toxicity as opposed to specific behavioral responedse

      Overall, the manuscript provides a wide range of preliminary observations of genetic interactions that may influence locomotory responses to H2S, but mechanistic insight or a synthesis of disparate data is lacking.

    3. Reviewer #4 (Public review):

      Summary:

      The authors establish a behavioral paradigm for avoidance of H2S and conduct a large candidate screen to identify genetic requirements. They follow up by genetically dissecting a large number of implicated pathways - insulin, TGF-beta, oxygen/HIF-1, and mitochondrial ROS, which have varied effects on H2S avoidance. They additionally assay whole-animal gene expression changes induced by varying concentrations and durations of H2S exposure.

      Strengths:

      The implicated pathways are tested extensively through mutants of multiple pathway molecules. The authors address previous reviewer concerns by directly testing the ability of ASJ to respond to H2S via calcium imaging. This allows the authors to revise their previous conclusion and determine that ASJ does not directly respond to H2S and likely does not initiate the behavioral response. Extensive experiments manipulating the mitochondrial ETC and ROS support the authors' revised model that mitochondrial toxicity is the major driver of H2S avoidance.

      It seems possible that HIF-1 and SKN-1 signaling directly modulate ROS toxicity while ASJ neurons and the oxygen sensing circuit could modulate the avoidance behavior. How this neuronal interaction happens remains unknown.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review): 

      Summary: 

      The manuscript explores behavioral responses of C. elegans to hydrogen sulfide, which is known to exert remarkable effects on animal physiology in a range of contexts. The possibility of genetic and precise neuronal dissection of responses to H2S motivates the study of responses in C. elegans. The revised manuscript does not seem to have significantly addressed what was lacking in the initial version. 

      The authors have added further characterization of possible ASJ sensing of H2S by calcium imaging but ASJ does not appear to be directly involved. Genetic and parallel analysis of O2 and CO2 responsive pathways do not reveal further insights regarding potential mechanisms underlying H2S sensing. Gene expression analysis extends prior work. Finally, the authors have examined how H2S-evoked locomotory behavioral responses are affected in mutants with altered stress and detoxification response to H2S, most notably hif-1 and egl-9. These data, while examining locomotion, are more suggestive that observed effects on animal locomotion are secondary to altered organismal toxicity as opposed to specific behavioral responedse 

      Overall, the manuscript provides a wide range of intriguing observations, but mechanistic insight or a synthesis of disparate data is lacking. 

      We thank the reviewer for the valuable feedback. We agree that while our investigation provides broad coverage, it does not fully resolve the mechanisms of H<sub>2</sub>S perception. As both reviewers noted, the avoidance response to high levels of H<sub>2</sub>S is most likely driven by its toxicity, particularly at the level of mitochondria, rather than by direct perception of H<sub>2</sub>S. We also favor this model and have revised the results and discussion to highlight this interpretation, while acknowledging that other mechanisms cannot be excluded (main changes lines 387-402 and 535-547).

      Building on this view, our observations point toward mitochondrial ROS transients as the trigger for H<sub>2</sub>S avoidance. First, toxic levels of H<sub>2</sub>S are known to promote ROS production (1). Second, similar to acute H<sub>2</sub>S, brief exposure to rotenone, an ETC complex I inhibitor that rapidly generates mitochondrial ROS, triggers locomotory responses (Figure 7E) (Lines 393-396). Third, regardless of duration, rotenone exposure inhibits H<sub>2</sub>S-evoked avoidance (Figure 7E) (Lines 389-391), likely by preventing or dampening H<sub>2</sub>S-evoked mitochondrial ROS bursts when ETC function is impaired and ROS is already high. Notably, animals subjected to prolonged rotenone exposure, ETC mutants, and quintuple sod mutants, each experiencing chronically high ROS levels, fail to respond to H<sub>2</sub>S and display reduced locomotory activity, presumably due to ROS toxicity and/or activation of stress-adaptive mechanisms (Figure 7).

      Consistent with the activation of stress-responsive pathways, H<sub>2</sub>S exposure alters expression of genes controlled by SKN-1 and HIF-1 signaling. Both pathways are ROS-sensitive and promote adaptation to chronic ROS production (2-4). Their activation, as in egl-9, render these animals insensitive to H<sub>2</sub>S-evoked ROS transients (Figure 5B) (Lines 303-305). Conversely, mutants defective in these adaptive pathways, such as hif-1, still show initial locomotory responses to H<sub>2</sub>S, but rapidly lose activity during prolonged H<sub>2</sub>S exposure (Figure 5D) (Lines 318-319). These observations suggest that HIF-1 pathway is dispensable for initiating the response to H<sub>2</sub>S evoked ROS transients, but essential for protecting against ROS toxicity.

      In this context, the neural circuit we examined, such as ASJ neurons, is not directly involved in H<sub>2</sub>S perception (Line 165-169 and 448-457). Instead, it likely modulates a circuit that is responsive to ROS toxicity. This circuit is also influenced by ambient O<sub>2</sub> levels, the state of O<sub>2</sub> sensing circuit, and nutrient status, in a manner reminiscent of the CO<sub>2</sub> responses (5, 6).

      Reviewer #4 (Public review): 

      Summary: 

      The authors establish a behavioral paradigm for avoidance of H2S and conduct a large candidate screen to identify genetic requirements. They follow up by genetically dissecting a large number of implicated pathways - insulin, TGF-beta, oxygen/HIF-1, and mitochondrial ROS, which have varied effects on H2S avoidance. They additionally assay whole-animal gene expression changes induced by varying concentrations and durations of H2S exposure. 

      Strengths: 

      The implicated pathways are tested extensively through mutants of multiple pathway molecules. The authors address previous reviewer concerns by directly testing the ability of ASJ to respond to H2S via calcium imaging. This allows the authors to revise their previous conclusion and determine that ASJ does not directly respond to H2S and likely does not initiate the behavioral response. 

      We thank the reviewer for the supportive comments.

      Weaknesses: 

      Despite the authors focus on acute perception of H2S, I don't think the experiments tell us much about perception. I think they indicate pathways that modulate the behavior when disrupted, especially because most manipulations used broadly affect physiology on long timescales. For instance, genetic manipulation of ASJ signaling, oxygen sensing, HIF-1 signaling, mitochondrial function, as well as starvation are all expected to constitutively alter animal physiology, which could indirectly modulate responses to H2S. The authors rule out effects on general locomotion in some cases, but other physiological changes could relatively specifically modulate the H2S response without being involved in its perception. 

      I am actually not convinced that H2S is directly perceived by the C. elegans nervous system at all. As far as I can tell, the avoidance behavior could be a response to H2S-induced tissue damage rather than the gas itself. 

      We thank the reviewer for the valuable insights, and fully agree that the H<sub>2</sub>S may not be directly perceived by C. elegans. Please see detailed responses below.

      Reviewer #4 (Recommendations for the authors): 

      The clarity of the paper is improved in this version. My main issue has to do with "perception" of H2S. At times the authors suggest that hydrogen sulfide should be perceived by a neural circuit ("we did not specifically identify the neural circuit mediating H2S signaling"), while at other times they discuss the possibility that it is not directly perceived neuronally ("Supporting the idea that acute mitochondrial ROS generation initiates avoidance of high H2S levels,"). The authors should clearly state their model for H2S perception. Do they think there is a receptor and sensory neuron for H2S (not identified in this paper)? If not, what does it mean for there to be a neural circuit mediating the response? To me, it looks more like what is being "perceived" by a neural circuit is ROS-induced toxicity, not H2S itself. 

      To drill down on direct modulation of acute perception, are any of the pathway manipulations used in this paper performed on the timescale of perception? Rotenone for 10 mins is close to that timescale, and in fact it increases speed independently of H2S, consistent with ROSinduced toxicity, not H2S being the signal that induces the behavior. Optogenetic activation of RMG could also be on the acute timescale. Can the authors clarify for how long blue light was on the worms before the start of the assay? Or was it turned on at the same time as video acquisition commenced? This could be evidence that RMG acutely modulates this behavioral response. 

      I feel that the ASJ calcium imaging data should be in the main figure given its importance in revising the original model. 

      We thank the reviewer for the valuable advice.

      As suggested, ASJ calcium imaging data are displayed in the main figure (Figure 2I) (Line 167).

      As both reviewers noted, our initial presentation was not sufficiently clear regarding the mechanism underlying H<sub>2</sub>S avoidance. We agree with the reviewer that H<sub>2</sub>S avoidance is unlikely mediated by direct perception via a H<sub>2</sub>S-specific receptor, but likely arises from acute mitochondrial dysfunction and ROS generation. 

      ROS

      In line with the reviewer’s perspective, our observations point toward mitochondrial ROS transients as the trigger for H<sub>2</sub>S avoidance. First, toxic levels of H<sub>2</sub>S are known to promote ROS production (1). Second, similar to acute H<sub>2</sub>S, brief exposure to rotenone, an ETC complex I inhibitor that rapidly generates mitochondrial ROS, triggers locomotory responses (Figure 7E) (Lines 393-396). Third, regardless of duration, rotenone exposure inhibits H<sub>2</sub>S-evoked avoidance (Figure 7E) (Lines 389-391), likely by preventing or dampening H<sub>2</sub>S-evoked mitochondrial ROS bursts when ETC function is impaired and ROS is already high. Notably, animals subjected to prolonged rotenone exposure, ETC mutants, and quintuple sod mutants, each experiencing chronically high ROS levels, fail to respond to H<sub>2</sub>S and display reduced locomotory activity, presumably due to ROS toxicity and/or activation of stress-adaptive mechanisms (Figure 7). We revised the Results and Discussion to present the model more consistently (main changes lines 387-402 and 535-547).

      Consistent with the activation of stress-responsive pathways, H<sub>2</sub>S exposure alters expression of genes controlled by SKN-1 and HIF-1 signaling. Both pathways are ROS-sensitive and promote adaptation to chronic ROS production (2-4). Their activation, as in egl-9, render these animals insensitive to H<sub>2</sub>S-evoked ROS transients (Figure 5B) (Lines 303-305). Conversely, mutants defective in these adaptive pathways, such as hif-1, still show initial locomotory responses to H<sub>2</sub>S, but rapidly lose activity during prolonged H<sub>2</sub>S exposure (Figure 5D) (Lines 318-319). These observations suggest that HIF-1 pathway is dispensable for initiating the response to H<sub>2</sub> Sevoked ROS transients, but essential for protecting against ROS toxicity.

      ASJ neurons

      ASJ neurons and DAF-11 signaling are required for H<sub>2</sub>S-evoked behavioral responses. However, ASJ does not exhibit an H<sub>2</sub>S-evoked calcium transient. It suggests that ASJ neurons do not directly detect H<sub>2</sub>S (Line 165-169 and 448-457), but likely modulate the circuit responsive to ROS toxicity. This circuit can also be modulated by ambient O<sub>2</sub> levels, the state of O<sub>2</sub> sensing circuit, and nutrient status, in a manner reminiscent of the CO<sub>2</sub> responses (5, 6). 

      O<sub>2</sub> sensing circuit

      Consistent with the reviewer’s view, we favor the model that H<sub>2</sub>S avoidance is likely induced by ROS transients. We believe that the state of O<sub>2</sub> sensing circuit, similar to ASJ neurons, modulates the neural circuit that is responsive to H<sub>2</sub>S-evoked ROS toxicity. This circuit is inhibited as long as O<sub>2</sub> sensing circuit is active. In the RMG optogenetic experiment, channelrhodopsin was photo-stimulated as soon as the assay was initiated at 7% O<sub>2</sub> (Methods Lines 633-634 and Figure legend Lines 1177-1178), therefore RMG remained active throughout the assay including at 7% O<sub>2</sub>. Our interpretation is that RMG activation inhibits this ROSresponsive circuit and H<sub>2</sub>S avoidance. However, these observations do not resolve if H<sub>2</sub>S is acutely and directly perceived. The modulation of H<sub>2</sub>S response by O<sub>2</sub> circuit was discussed between Lines 437-447.

      References

      (1) J. Jia et al., SQR mediates therapeutic effects of H(2)S by targeting mitochondrial electron transport to induce mitochondrial uncoupling. Sci Adv 6, eaaz5752 (2020).

      (2) S. J. Lee, A. B. Hwang, C. Kenyon, Inhibition of Respiration Extends C. elegans Life Span via Reactive Oxygen Species that Increase HIF-1 Activity. Current Biology 20, 2131-2136 (2010).

      (3) C. Lennicke, H. M. Cocheme, Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81, 3691-3707 (2021).

      (4) D. A. Patten, M. Germain, M. A. Kelly, R. S. Slack, Reactive oxygen species: stuck in the middle of neurodegeneration. J Alzheimers Dis 20 Suppl 2, S357-367 (2010).

      (5) A. J. Bretscher, K. E. Busch, M. de Bono, A carbon dioxide avoidance behavior is integrated with responses to ambient oxygen and food in Caenorhabditis elegans. Proc Natl Acad Sci U S A 105, 8044-8049 (2008).

      (6) E. A. Hallem, P. W. Sternberg, Acute carbon dioxide avoidance in Caenorhabditis elegans. Proc Natl Acad Sci U S A 105, 8038-8043 (2008).

    1. eLife Assessment

      This valuable study uses EEG and computational modeling to investigate hemispheric oscillatory asymmetries in unilateral spatial neglect. The work benefits from rare patient data and a careful multimethod approach. However, the evidence is incomplete because key assumptions about alpha‑band entrainment and methodological confounds such as lesion variability and eye‑movement artifacts remain insufficiently addressed.

    2. Reviewer #1 (Public review):

      Summary:

      Okazaki et al. showed flickering stimuli to patients with unilateral spatial neglect (USN) and measured EEG responses. They compared this with another patient group (post-stroke, but no USN) and healthy controls. The author's rationale was to entrain intrinsic brain rhythms using the flicker of different frequencies (3-30 Hz). Effects found unique to the 9-Hz stimulation condition differentiate USN patients from the other groups, leading them to conclude that USN can be characterized by increased hemispheric alpha asymmetry, driven by a relatively increased response in the intact hemisphere.

      Strengths:

      This study is principled empirical work that benefits from access to special patient groups of considerable size (about 60 stroke patients in total, and 20 USN). The authors use state-of-the-art established methods to (1) deliver and (2) quantify the responses to the flicker stimulation in the EEG recordings. In addition, they use phase-coupling measures to investigate cross-frequency coupling (here: alpha-gamma) and a measure of directed connectivity between brain areas, transfer entropy. The results are supported by means of simulations using a coupled-oscillators model.

      Weaknesses:

      In my eyes, the major conceptual weakness of the study is that the authors make the a priori assumption that the flicker stimulation entrains intrinsic brain rhythms, especially alpha (9 Hz). To date, there is no direct (and only equivocal indirect) evidence that alpha rhythms can be entrained with periodic visual stimulation. In the present study, the assumption of alpha entrainment permeates some analytical decisions - where it would be possible to separate stimulus-driven from intrinsic rhythms more strongly than is currently the case, potentially yielding deeper insights into the oscillopathy of USN - and, ultimately, the interpretation of the results. Another potential issue to consider here is the analysis of gamma rhythms in EEG data, absent a control of miniature eye movements, a known problem (Yuval-Greenberg et al., 2008, https://doi.org/10.1016/j.neuron.2008.03.027) that may be exacerbated here, given that USN patients could show different auxiliary gaze behaviour.

    3. Reviewer #2 (Public review):

      This study investigates how altered neural oscillations may contribute to unilateral spatial neglect (USN) following right-hemisphere stroke. By combining steady-state visual evoked potentials (SSVEPs), phase-amplitude coupling (PAC), transfer entropy (TE), and computational modeling, the authors aim to show that USN arises from disrupted hemispheric synchronization dynamics rather than simply from lesion extent. The integration of empirical EEG data with a mechanistic model is a major strength and offers a valuable new perspective on how frequency-specific neural dynamics relate to clinical symptoms.

      The work has several notable strengths. The combination of experimental and modeling approaches is innovative and powerful, and the findings provide a coherent mechanistic framework linking abnormal neural entrainment to attentional deficits. The study also provides concrete evidence to support the potential for frequency-specific neuromodulatory interventions, which could have translational relevance.

      At the same time, there are areas where the evidence could be clarified or contextualized further. The manuscript would benefit from more detailed characterization of lesions, since differences in lesion topography (white vs. gray matter, occipital vs. parietal areas) could greatly improve our understanding of the physiopathology causing unilateral spatial neglect and the altered neural oscillations reported. Methodological choices, such as focusing analyses on occipital electrodes rather than parietal sites, and the potential influence of volume conduction in transfer entropy analyses, also need clearer justification/elaboration. In addition, while the authors report several neural metrics, it is not always clear why SSVEP power was chosen as the primary correlate of clinical severity over other measures. More broadly, the manuscript would be strengthened by clearer definitions of dependent variables and reporting of software and toolboxes used.

      Overall, the study makes a significant contribution by demonstrating that USN can be conceptualized as a disorder of disrupted oscillatory dynamics. With some clarifications and expansions, the paper will provide readers with a clearer understanding of both the strengths and the limitations of the evidence, and it will stand as a valuable reference for future work on oscillatory mechanisms in stroke and attention.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      Okazaki et al. showed flickering stimuli to patients with unilateral spatial neglect (USN) and measured EEG responses. They compared this with another patient group (post-stroke, but no USN) and healthy controls. The author's rationale was to entrain intrinsic brain rhythms using the flicker of different frequencies (3-30 Hz). Effects found unique to the 9-Hz stimulation condition differentiate USN patients from the other groups, leading them to conclude that USN can be characterized by increased hemispheric alpha asymmetry, driven by a relatively increased response in the intact hemisphere.

      Strengths:

      This study is principled empirical work that benefits from access to special patient groups of considerable size (about 60 stroke patients in total, and 20 USN). The authors use state-of-the-art established methods to (1) deliver and (2) quantify the responses to the flicker stimulation in the EEG recordings. In addition, they use phase-coupling measures to investigate cross-frequency coupling (here: alpha-gamma) and a measure of directed connectivity between brain areas, transfer entropy. The results are supported by means of simulations using a coupled-oscillators model.

      Weaknesses:

      In my eyes, the major conceptual weakness of the study is that the authors make the a priori assumption that the flicker stimulation entrains intrinsic brain rhythms, especially alpha (9 Hz). To date, there is no direct (and only equivocal indirect) evidence that alpha rhythms can be entrained with periodic visual stimulation. In the present study, the assumption of alpha entrainment permeates some analytical decisions - where it would be possible to separate stimulus-driven from intrinsic rhythms more strongly than is currently the case, potentially yielding deeper insights into the oscillopathy of USN - and, ultimately, the interpretation of the results. Another potential issue to consider here is the analysis of gamma rhythms in EEG data, absent a control of miniature eye movements, a known problem (Yuval-Greenberg et al., 2008, https://doi.org/10.1016/j.neuron.2008.03.027) that may be exacerbated here, given that USN patients could show different auxiliary gaze behaviour.

      Reviewer #1 expressed concern that alpha entrainment is assumed a priori; however, our interpretation is based on the empirical observation of frequency-specific (9 Hz) hemispheric asymmetry, not on a prior assumption. This 9 Hz specificity is difficult to explain by a simple summation of stimulus-evoked responses and is more appropriately interpreted as a resonance phenomenon in the alpha band, which is close to the intrinsic resonance frequency of the visual system [1, 2]. In the revision, we will strengthen the conceptual distinction between stimulus-driven and intrinsic components and clarify that entrainment is a conclusion supported by our data and modeling.

      Gamma contamination by eye movements is a valid theoretical concern. However, it is unlikely that saccadic spike potentials explain our α-γ coupling findings, due to several factors including timing constraints and spectral properties. In the revision, we will add explicit discussion of this limitation while explaining why our coupling patterns are more consistent with physiological neural coupling than with artifacts.

      Reviewer #2 (Public review):

      This study investigates how altered neural oscillations may contribute to unilateral spatial neglect (USN) following right-hemisphere stroke. By combining steady-state visual evoked potentials (SSVEPs), phase-amplitude coupling (PAC), transfer entropy (TE), and computational modeling, the authors aim to show that USN arises from disrupted hemispheric synchronization dynamics rather than simply from lesion extent. The integration of empirical EEG data with a mechanistic model is a major strength and offers a valuable new perspective on how frequency-specific neural dynamics relate to clinical symptoms.

      The work has several notable strengths. The combination of experimental and modeling approaches is innovative and powerful, and the findings provide a coherent mechanistic framework linking abnormal neural entrainment to attentional deficits. The study also provides concrete evidence to support the potential for frequency-specific neuromodulatory interventions, which could have translational relevance At the same time, there are areas where the evidence could be clarified or contextualized further. The manuscript would benefit from more detailed characterization of lesions, since differences in lesion topography (white vs. gray matter, occipital vs. parietal areas) could greatly improve our understanding of the physiopathology causing unilateral spatial neglect and the altered neural oscillations reported. Methodological choices, such as focusing analyses on occipital electrodes rather than parietal sites, and the potential influence of volume conduction in transfer entropy analyses, also need clearer justification/elaboration. In addition, while the authors report several neural metrics, it is not always clear why SSVEP power was chosen as the primary correlate of clinical severity over other measures. More broadly, the manuscript would be strengthened by clearer definitions of dependent variables and reporting of software and toolboxes used.

      Overall, the study makes a significant contribution by demonstrating that USN can be conceptualized as a disorder of disrupted oscillatory dynamics. With some clarifications and expansions, the paper will provide readers with a clearer understanding of both the strengths and the limitations of the evidence, and it will stand as a valuable reference for future work on oscillatory mechanisms in stroke and attention.

      We agree that further lesion characterization would be generally useful. However, as shown in Supplementary Figure 1, lesions in our USN cohort involved both cortical and subcortical regions, and cortical damage often extended into adjacent white matter. Therefore, a strict gray-versus-white-matter classification was not feasible. This anatomical diversity suggests that the frequency-specific hemispheric asymmetry observed here cannot be fully explained by lesion location or size alone, but rather may reflect altered network dynamics following right-hemisphere damage. We will clarify this point in the revised Discussion.

      Regarding transfer entropy (TE) and volume conduction, TE is theoretically insensitive to zero-lag correlations and quantifies temporally directed information transfer. Furthermore, we used amplitude envelopes rather than raw oscillations as input, which should greatly reduce the risk of spurious causal estimation due to sinusoidal autocorrelation structure. Moreover, if such spurious connectivity due to autocorrelation had occurred, it would have been expected to appear equally in both feedforward and feedback directions. Therefore, the feedforward-limited (visual→frontal) asymmetry observed in our study cannot be explained by volume conduction or autocorrelation effects. We will maintain this position clearly in the revision.

      Regarding other methodological points: we focused on occipital electrodes (O1/O2) because visual stimuli primarily drive the visual system (we also analyzed parietal sites but found no significant hemispheric differences; Figure 4). We chose SSVEP power for clinical correlation because it was the primary phenomenon distinguishing USN from non-USN patients. In the revision, we will clarify these points and include software and toolbox information.

      We believe these revisions will substantially strengthen the manuscript and clarify the conceptual and methodological contributions of our study.

      References

      (1) Rosanova, M., Casali, A., Bellina, V., Resta, F., Mariotti, M., and Massimini, M. (2009). Natural frequencies of human corticothalamic circuits. J Neurosci 29, 7679-7685.

      (2) Okazaki, Y.O., Nakagawa, Y., Mizuno, Y., Hanakawa, T., and Kitajo, K. (2021). Frequency- and Area-Specific Phase Entrainment of Intrinsic Cortical Oscillations by Repetitive Transcranial Magnetic Stimulation. Front Hum Neurosci 15, 608947.

    1. eLife Assessment

      This study presents an important toolkit for visualising the endogenous expression of four classes of neurotransmitter vesicular transporters. Using their toolkit, the authors find that there is co-transmission of neurotransmitters in over 10% of neurons tested. Although the evidence presented in the manuscript is solid, one weakness of this study is the failure of the authors to compare and contrast their results with available single-cell sequencing datasets and with well-established synaptic reporter lines (i.e., co-localization experiments). This toolkit will be of great use to multiple labs, and the authors should indicate their plan to disseminate the reagents and the associated information that is part of this kit.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitter-specific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) Through the use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offer new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons - perhaps through case study examples - would strengthen the study's impact.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-type-specific KO to analyze the function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter - resulting in fluorescence - only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Weaknesses:

      The authors evaluated the positions of fluorescent puncta by visually comparing their positions with the positions of synapses indicated by EM reconstruction. It would provide stronger supportive evidence if the authors also examined co-localization of these reporters with well-established synaptic reporters previously published by their lab, such as reporters that label presynaptic sites of AIY interneurons.

      This toolkit will likely be widely used by the C. elegans community. To facilitate the adoption of the approach and method by worm labs, the authors should include their plan for the dissemination of all of the reagents included in the kit, along with all of the associated information, including construct sequences and the protocols for their use.

    4. Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, GABAergic, and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the co-expression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Weaknesses:

      A weakness of the study is a lack of comparison to previously published single-cell sequencing data. These tools are alternatively described in the manuscript as superior to the sequencing data and as validation of the sequencing data, but neither claim can be assessed without knowing how they compare and contrast to that data. It is thus not clear to what extent the conclusions of this paper are an advance over what could be determined from the sequencing data on its own. Finally, some technical considerations should be discussed as potential caveats to the robustness of their intersectional strategy for concluding that certain genes are indeed co-expressed. Overall, claims about co-transmission should be tempered by the caveats presented in the discussion, suggesting that co-expression of these transporters is not in and of itself sufficient for neurotransmitter release.

    1. eLife Assessment

      This study introduces Megabouts, a transformer-based classifier for larval zebrafish movement bouts. This useful tool is thoughtfully implemented and has clear potential to unify analyses across labs. However, the evidence supporting its robustness is incomplete. How the method generalizes across datasets, how sensitive it is to noise, and the specific sources of misclassification are unclear. The method would also be strengthened by providing options for users to fine-tune the clusters under different experimental conditions, which would further enhance reliability and flexibility.

    2. Reviewer #1 (Public review):

      Jouary et al. present Megabouts, a Transformer-based classifier and Python toolbox for automated categorization of zebrafish movement bouts into 13 bout types. This is potentially a very useful tool for the zebrafish community. It is broadly applicable to a wide variety of behavioral paradigms and could help to unify behavioral quantification across labs. The overall implementation is technically sound and thoughtfully engineered. The choice of standard Transformer architecture is well-justified (e.g., it can handle long-term tracking data and process missing data, integrates posture and trajectory information over time, and shows robustness to variable frame rates and partial occlusion). The data augmentation strategies (e.g., downsampling, tail masking, and temporal jitter) are well designed to enhance cross-condition generalization. Thus, I very much support this work.

      For the benefit of the end users of this tool, several clarifications and additional analyses would be helpful:

      (1) What is the source and nature of the classification errors? The reported accuracy is <80% with trajectory data and still <90% with trajectory + tail data.

      (1a) Is this due to model failure (is overfitting a concern? How unbiased were the test sets?), imperfections of the preprocessing step (how sensitive is this to noise in the input data?), or underlying ambiguity in the biological data (e.g., do some "errors" reflect intermediate patterns that don't map neatly onto the 13 discrete classes)?

      (1b) A systematic error analysis would be helpful. Which classes are most often confused? Are errors systematic (e.g., slow swims vs. routine turns) or random?

      (1c) Can confidence of classification be provided for each bout in the data? How would the authors recommend that the end user deal with misclassifications (e.g., by manual correction)?<br /> Overall, the end user would benefit greatly from more information on potential failure modes and their root causes.

      (2) How well does the trained network generalize across labs and setups? To what extent have the authors tested this on datasets from other labs to determine how well the pretrained model transfers across datasets? Having tested the code provided by the authors on a short stretch of x-y zebrafish trajectory data obtained independently, the pipeline generates phantom movement annotations. The underlying cause is unclear.

      (2a) One possibility is that preprocessing steps may be highly sensitive to slight noise in the x-y positional data, which leads to noise in the speed data. The neural net, in turn, classifies noise into movement annotations. It would be helpful if the authors could add Gaussian noise to the x-y trajectory data and then determine the extent to which the computational pipeline is robust to noise.

      (2b) When testing the pipeline, some stationary periods are classified as movements. Which step of the pipeline gave rise to the issue is unclear. Thus, explicit cross-lab validation and robustness tests (e.g., adding Gaussian noise to trajectories) would strengthen the claims of this paper.

      (2c) Lastly, given the potential issue of generalization across labs, it would be helpful to provide/outline the steps for users in different labs to retrain and fine-tune the model.

    3. Reviewer #2 (Public review):

      Summary:

      Overall, the manuscript is well organized and clearly written. However, in this reviewer's opinion, the manuscript suffers from multiple major weaknesses.

      Strengths:

      The strengths of the paper are unclear; they have not been articulated well by the authors.

      Weaknesses:

      The pipeline is designed to analyze larval zebrafish behaviors, which by definition is considered a highly specialized, if not niche, application. Hence, the scope of this manuscript is extremely narrow, and consequently, the overall significance and the broader impact on the field of behavioral neuroscience are rather low. Broadening the scope would significantly improve the manuscript's impact. Second, it was noted that the authors neglect to present an unbiased discussion of how their pipeline compares to well-established and time-proven pipelines used to track larval zebrafish behaviors. This reviewer also failed to detect any new biological insights presented or improvements compared to existing methods, further questioning the overall significance and impact of this manuscript. Finally, the core claim of the manuscript lacks meaningful experimental data that would allow an unbiased and more definitive evaluation of the claims made regarding the Megabouts pipeline. The critical experiment to achieve this would be to run an identical set of behavioral assays (e.g., PPI, social behaviors) on different platforms (e.g., a commercial and a non-commercial one) and then determine if Megabouts correctly analyzes and integrates the results. While this might sound to the authors like an 'outside the scope' experiment, this reviewer would argue that it is the only meaningful experiment to validate the central claim put forward in this manuscript.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors introduce Megabouts, a software package designed to standardize the analysis of larval zebrafish locomotion, through clustering the 2D posture time series into canonical behavioral categories. Beyond a first, straightforward segmentation that separates glides from powered movements, Megabouts uses a Transformer neural network to classify the powered movements (bouts). This Transformer network is trained with supervised examples. The authors apply their approach to improve the quantification of sensorimotor transformations and enhance the sensitivity of drug-induced phenotype screening. Megabouts also includes a separate pipeline that employs convolutional sparse coding to analyze the less predictable tail movements in head-restrained fish.

      I presume that the software works as the authors intend, and I appreciate the focus on quantitative behavior. My primary concerns reflect an implicit oversimplification of animal behavior. Megabouts is ultimately a clustering technique, categorizing powered locomotion into distinct, labelled states which, while effective for analysis, may confuse the continuous and fluid nature of animal behavior. Certainly, Megabouts could potentially miss or misclassify complex, non-stereotypical movements that do not fit the defined categories. In fact, it appears that exactly this situation led the authors to design a new clustering for head-restrained fish. Can we anticipate even more designs for other behavioral conditions?

      Ultimately, I am not yet convinced that Megabouts provides a justifiable picture of behavioral control. And if there was a continuous "control knob", which seems very likely, wouldn't that confuse the clustering process, as many distinct clusters would correspond to, say, different amplitudes of the same control knob?

      There has been tremendous recent progress in the measurement and analysis of animal behavior, including both continuous and discrete perspectives. However, the supervised clustering approach described here feels like a throwback to an earlier era. Yes, it's more automatic and quantifiable, and the amount of data is fantastic. But ultimately, the method is conceptually bound to the human eye in conditions where we are already familiar.

    1. eLife Assessment

      This valuable work potentially advances our understanding of melody extraction in polyphonic music listening by identifying spontaneous attentional focus in uninstructed listening contexts. However, the evidence supporting the main conclusions is incomplete. The work will be of interest to psychologists and neuroscientists working on music listening, attention, and perception in ecological settings.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the interplay between spontaneous attention and melody formation during polyphonic music listening. The authors use EEG recordings during uninstructed listening to examine how attention bias influences melody processing, employing both behavioural measures and computational modelling with music transformers. The study introduces a very clever pitch-inversion manipulation design to dissociate high-voice superiority from melodic salience, and proposes a "weighted integration" model where attention dynamically modulates how multiple voices are combined into perceived melody.

      Strengths:

      (1) The attention bias findings (Figure 2) are compelling and methodologically sound, with convergent evidence from both behavioral and neural measures.

      (2) The pitch-inversion manipulation appears to super elegantly dissociate two competing factors (high-voice superiority vs melodic salience), moreover, the authors claim that the chosen music lends itself perfectly to his PolyInv condition. A claim I cannot really evaluate, but which would make it even more neat.

      (3) Nice bridge between hypotheses and operationalisations.

      Weaknesses:



      The results in Figure 3 are very striking, but I have a number of questions before I can consider myself convinced. 


      (1) Conceptual questions about surprisal analysis:


      The pattern of results seems backwards to me. Since the music is inherently polyphonic in PolyOrig, I'd expect the polyphonic model to fit the brain data better - after all, that's what the music actually is. These voices were composed to interact harmonically, so modeling them as independent monophonic streams seems like a misspecification. Why would the brain match this misspecified model better?
<br /> Conversely, it would seem to me the pitch inversion in PolyInv disrupts (at least to some extent) the harmonic coherence, so if anywhere, I'd a priori expect that in this condition, listeners would rather be processing streams separately - making the monophonic model fit better there (or less bad), not in PolyOrig. The current pattern is exactly opposite to what seems logical to me.


      (2) Missing computational analyses:


      If the transformer is properly trained, it should "understand" (i.e., predict/compress) the polyphonic music better, right? Can the authors demonstrate this via perplexity scores, bits-per-byte, or other prediction metrics, comparing how well each model (polyphonic vs monophonic) handles the music in both conditions? Similarly, if PolyInv truly maintains musical integrity as claimed, the polyphonic model should handle it as well as PolyOrig. But if the inversion does disrupt the music, we should see this reflected in degraded prediction scores. These metrics would validate whether the experimental manipulation works as intended. Also, how strongly are the surprisal streams correlated? There are many non-trivial modelling steps that should be reported in more detail.


      (3) Methodological inconsistencies:

      Why are the two main questions (Figures 2 and 3) answered with completely different analytical approaches? The switch from TRF to CCA with match-vs-mismatch classification seems unmotivated. I think it's very important to provide a simpler model comparison - just TRF with acoustic features plus either polyphonic or monophonic surprisal - evaluated on relevant electrodes or the full scalp. This would make the results more comparable and interpretable.

      (4) Presentation and methods:

      a) Coming from outside music/music theory, I found the paper somewhat abstract and hard to parse initially. The experimental logic becomes clearer with reflection, but you're doing yourselves a disservice with the jargon-heavy presentation. It would be useful to include example stimuli.

      b) The methods section is extremely brief - no details whatsoever are provided regarding the modelling: What specific music transformer architecture? Which implementation of this "anticipatory music transformer"? Pre-trained on what corpus - monophonic, polyphonic, Western classical only? What constituted "technical issues" for the 9 excluded participants? What were the channel rejection criteria?

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to understand the drivers of spontaneous attentional bias and melodic expectation generation during listening to short two-part classical pieces. They measured scalp EEG data in a monophonic condition and trained a model to reconstruct the audio envelope from the EEG. They then used this model to probe which of the two voices was best reflected in the neural signal during two polyphonic conditions. In one condition, the original piece was presented, in the other, the voices were switched in an attempt to distinguish between effects of (a) the pitch range of one voice compared to the other and (b) intrinsic melodic features. They also collected a behavioural measure of attentional bias for a subset of the stimuli in a separate study. Further modelling assessed whether expectations of how the melody would unfold were formed based on an integrated percept of melody across the two voices, or based on a single voice. The authors sought to relate the findings to different theories of how musical/auditory scene analysis occurs, based on divided attention, figure-ground perception, and stream integration.

      Strengths:

      (1) A clever but simple manipulation - transposing the voices such that the higher one became the lower one - allowed an assessment of different factors that might affect the allocation of attention.

      (2) State-of-the-art analytic techniques were applied to (a) build a music attention decoder (these are more commonly encountered for speech) and (b) relate the neural data to features of the stimulus at the level of acoustics and expectation.

      (3) The effects appeared robust across the group, not driven by a handful of participants.

      Weaknesses:

      (1) A key goal of the work is to establish the relative importance for the listener's attention of a voice's (a) mean pitch in the context of the two voices (high-voice superiority) and (b) intrinsic melodic statistics/motif attractiveness. The rationale of the experimental manipulation is that switching the relative height of the lines allows these to be dissociated by imparting the same high-voice benefit to the new high-voice and the same preferred intrinsic melodic statistics to the new low voice. However, previous work suggests that the high-voice superiority effect is not all-or-nothing. Electrophysiology supported by auditory nerve modelling found it to depend on the degree of voice separation in a non-monotonic way (see https://doi.org/10.1016/j.heares.2013.07.014 at p. 68). Although the authors keep the overall pitch of the lower (and upper) line fixed across conditions, systematically different contour patterns across the voices could give rise to a sub-optimal distribution of separations in the PolyInv versus PolyOrig condition. This could weaken the high-voice superiority effect in PolyInv and explain the pattern of results. One could argue that such contour differences are examples of the "intrinsic melodic statistics" put forward as the effect working in opposition to high-voice superiority, but it is their interaction across voices that matters here.

      (2) Although melody statistics are mentioned throughout, none have been calculated. It would be helpful to see the features that presumably lead to "motif attractiveness" quantified, as well as how they differ across lines. The work of David Huron, such as at https://dl.acm.org/doi/abs/10.1145/3469013.3469016, provides examples that could be calculated with ease and compared across the two lines: "the tendency for small over large pitch movements, for large leaps to ascend, for musical phrases to fall in pitch, and for phrases to begin with an initial pitch rise". The authors also mention differences in ornamentation. Such comparisons would make it more tangible for the reader as to what differs across the original "melody" and "support" line. In particular, as the authors themselves note, lines in double-counterpoint pieces can, to a degree, operate interchangeably. Bach's inventions in particular use a lot of direct repetition (up to octave invariance), which one would expect to minimise differences in the statistics mentioned. The references purporting to relate to melodic statistics (11-14 in original numbering) seem rather to relate to high-voice superiority.

      (3) The exact nature of the transposition manipulation is obscured by a confusing Figure 1B, which shows an example in which the transposed line does not keep the same note-to-note interval structure as the original line.

      (4) The transformer model is barely described in the main text. Even readers who are familiar with the Hidden Markov Models (e.g., in IDyOM) previously used by some of the authors to model melodic surprise and entropy would benefit from a brief description in the main text at least of how transformer models are different. The Methods section goes a little further but does not mention what the training set was, nor the relative weight given to long- and short-term memory models.

      (5) The match-mismatch procedure should be explained in enough detail for readers to at least understand what value represents chance performance and why performance would be measured as an average over participants. Relatedly, there is no description at all of CCA or the match-mismatch procedure in the Methods.

      (6) Details of how the integration model was implemented will be critical to interpreting the results relating to melodic expectations. It is not clear how "a single melody combining the two streams" was modelled, given that at least some notes presumably overlapped in time.

      (7) The authors propose a weighted integration model, referring in the Discussion to dynamics and an integration rate. They do show that in the PolyOrig case, the top stream bias is highest and the monophonic model gives the best prediction, while in the PolyInv case, the top stream bias is weaker and the polyphonic model provides the best prediction. However, that doesn't seem to say anything about the temporal rate of integration, just the degree, which could be fixed over the whole stimulus. Relatedly, the terms "strong attention bias" and "weak attention bias" in Highlight 4 might give the impression of different attention modes for a given listener, or perhaps different types of listeners, but this seems to be shorthand for how attention is allocated for different types of stimuli (namely those that have or have not had their voices reversed).

      (8) Another aspect of the presentation relating to temporal dynamics is that in places (e.g., Highlight 1), the authors suggest they are tracking attention dynamically. However, as acknowledged in the Discussion, neither the behavioural nor neural measure of attentional bias are temporally resolved. The measures indicate that on average participants attend more to the higher line (less so when it formed the lower line in the original composition).

      (9) It is not clear whether the sung-back data were analysed (and if not why participants were asked to sing the melody back rather than just listen to the two components and report which they thought was the melody). It is also not stated whether the order in which the high and low voices were played back was randomised. If not, response biases or memory capacity might have affected the behavioural attention data.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Winchester and colleagues investigated melodic perception in natural music listening. They highlight the central role of attentional processes in identifying one particular stream in polyphonic material, and propose to compare several theoretical accounts, namely (1) divided attention, (2) figure-ground separation, and (3) stream integration. In parallel, the authors compare the relative strength of exogenous attentional effects (i.e., salience) produced by two common traits of melodies: high-pitch (compared to other voices), and attractive statistics. To ensure the generalisability of their results to real-life listening contexts, they developed a new uninstructed listening paradigm in which participants can freely attend to any part of a musical stimulus.

      Major strengths and weaknesses of the methods and results:

      (1) Winchester and colleagues capitalized on previous attention decoding techniques and proposed an uninstructed listening paradigm. This is an important innovation for the study of music perception in ecological settings, and it is used here to investigate the spontaneous attentional focus during listening. The EEG decoding results obtained are coherent with the behavioral data, suggesting that the paradigm is robust and relevant.

      (2) The authors first evaluate the relative importance of high-pitch and statistics in producing an attentional bias (Figure 2). Behavioral results show a clear pattern, in which both effects are present, with a dominance of the high-pitch one. The only weakness inherent to this protocol is that behavioral responses are measured based on a second presentation of short samples, which may induce a different attentional focus than in the first uninstructed listening.

      (3) Then, the analyses of EEG data compare the decoding results of each melody (the high or low voice, and with "richer" or "poorer" statistics), and show a similar pattern of results. However, this report leaves open the possibility of a confounding factor. In this analysis, a TRF decoding model is first trained based on the presentation of monophonic samples, and it is later used to decode the envelope of the corresponding melodies in the polyphonic scenario. The fitting scores of the training phase are not reported. If the high-pitch or richer melodies were to produce higher decoding scores during monophonic listening (due to properties of the physiological response, or to perceptual processes), a similar difference could be expected during polyphonic listening. To capture attentional biases specifically, the decoding scores in the polyphonic conditions should be compared to the scores in the monophonic conditions, and attention could be expected to increase the decoding of the attended stream or decrease the unattended one.

      (4) Then, Winchester and colleagues investigate the processing of melodic information by evaluating the encoding of melodic surprise and uncertainty (Figure 3). They compare the surprise and uncertainty estimated from a monophonic or a polyphonic model (Anticipatory Music Transformer), and analyse the data with a CCA analysis. The results show a double dissociation, where the processing of melodies with a strong attentional bias (high-pitch, rich statistics) is better approximated with a monophonic model, while a polyphonic model better classifies the other melodies. While this global result is compelling, it remains a preliminary and intriguing finding, and the manuscript does not further investigate it. As it stands, the result appears more like a starting point for further exploration than a definitive finding that can support strong theoretical claims. First, it could be complemented by a comparison of the encoding of individual melodies (e.g., AMmono high-voice vs AMmono low-voice, in PolyOrig and PolyInv conditions) to highlight a more direct correspondence with the previous results (Figure 2) and allow a more precise interpretation. Second, additional analyses or experiments would be needed to unpack this result and provide greater explanatory power. Additionally, the CCA analysis is not described in the method. The statistical testing conducted on this analysis seems to be performed across the 250 repetitions of the evaluation rather than across the 40 participants, which may bias the resulting p-values. Moreover, the choice and working principle of the Anticipatory Music Transformer are not described in the method. Overall, these results seem at first glance solid, but the missing parts of the method do not allow for full evaluation or replication of them.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) Winchester and colleagues aimed at identifying the melodic stream that attracts attention during the listening of natural polyphonic music, and the underlying attentional processes. Their behavioral results confirm that high-pitched and attractive statistics increase melodic salience with a greater effect size of the former, as stated in the discussion. The TRF analyses of EEG data seem to show a similar pattern, but could also be explained by confounding factors. Next, the authors interpret the CCA results as the results of stream segregation when there is a high melodic salience, and stream integration when there are weaker attentional biases. These interpretations seem to be supported by the data, but unfortunately, no additional analyses or experiments have been conducted to further evaluate this hypothesis. The authors also acknowledge that their results do not show whether stream segregation occurs via divided attention or figure-ground separation. However, the lack of information about the music model used (Anticipatory Music Model) and the way it was set up raises some questions about its relevance and limits as a model of cognition (e.g. Is this transformer a "better" model of the listeners' expectations than the well-established IDyOM model, and why ?), and about the validity of those results.

      (2) Overall, the authors achieved most of the aims presented in the introduction, although they couldn't give a more precise account of the attentional processes at stake. The interpretations are sound and not overstated, with the exception of potential confounding factors that could compromise the conclusions on the neural tracking of salient melodies (EEG results, Figure 2).

      Impact of the work on the field, and the utility of the methods and data to the community:

      The new uninstructed listening paradigm introduced in this paper will likely have an important impact on psychologists and neuroscientists working on music perception and auditory attention, enabling them to conduct experiments in more ecological settings. While the attentional biases towards melodies with high-pitch and attractive statistics are already known, showing their relative effect is an important step in building precise models of auditory attention, and allows future paradigms to explore more fine-grained effects. Finally, the stream segregation and integration shown with this paradigm could be important for researchers working on music perception. Future work may be necessary to identify the models (Markov chains, deep learning) and setup (data analysis, stimuli, control variables) that do or do not replicate these results.

    1. eLife Assessment

      This study provides an important contribution by showing that whiteflies and planthoppers use salivary effectors to suppress plant immunity through the receptor-like protein RLP4, suggesting convergent evolution in these insect lineages. The topic is of clear interest for understanding plant-insect interactions and offers ideas that could stimulate further research in the field. However, the strength of evidence is incomplete, as some aspects of the data and experimental design limit the extent to which the main claims are fully supported.

    2. Reviewer #1 (Public review):

      Summary:

      This is a well-structured and interesting manuscript that investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Weaknesses:

      Western blot evidence for effector secretion is weak. The possibility of contamination from insect tissues during the sample preparation should be avoided.

      Below are some specific comments and suggestions to strengthen the manuscript.

      (1) Western blot evidence for effector secretion:

      The western blot evidence in Figure 1, which aims to show that the insect protein is secreted into plants, is not fully convincing. The band of the expected size (~30 kDa) in the infested tissues is very weak. Furthermore, the high and low molecular weight bands that appear in the infested tissues do not match the size of the protein in the insects themselves, and a high molecular weight band also appears in the uninfested control tissues. It is difficult to draw a definitive conclusion that this protein is secreted into the plants based on this evidence. The authors should also address the possibility of contamination from insect tissues during the sample preparation and explain how they have excluded this possibility.

      (2) Inconsistent conclusion (Line 156 and Figure 3c): T

      The statement in line 156 is inconsistent with the data presented in Figure 3c. The figure clearly shows that the LRR domain of the protein is the one responsible for the interaction with BtRDP, not the region mentioned in the text. This is a critical misrepresentation of the experimental findings and must be corrected. The conclusion in the text should accurately reflect the data from the figure.

      (3) Role of SOBIR1 in the RLP4/SOBIR1 Complex:

      The authors demonstrate that the salivary effectors destabilize the RLP4 receptor, leading to a decrease in its protein levels and a reduction in the RLP4/SOBIR1 complex. A key question remains regarding the fate of SOBIR1 within this complex. The authors should clarify what happens to the SOBIR1 protein after the destabilization of RLP4. Does SOBIR1 become unbound, targeted for degradation itself, or does it simply lose its function without RLP4? This would provide further insight into the mechanism of action of the effectors.

      (4) Clarification on specificity and evolutionary claims:

      The paper's most significant claim is that the effectors from both whiteflies and planthoppers "independently evolved" to target RLP4. While the functional data is compelling, this evolutionary claim would be more convincing with stronger evidence. Showing that two different effector proteins target the same host protein is a fascinating finding but without a robust phylogenetic analysis, the claim of independent evolution is not fully supported. It would be valuable to provide a more detailed evolutionary analysis, such as a phylogenetic tree of the effector proteins, showing their relationship to other known insect proteins, to definitively rule out a shared, but highly divergent, common ancestor.

      (5) Role of SOBIR1 in the interaction:

      The results suggest that the effectors disrupt the RLP4/SOBIR1 complex. It is not entirely clear if the effectors are specifically targeting RLP4, SOBIR1, or both. Further experiments, such as a co-immunoprecipitation assay with just RLP4 and the effector, could clarify if the effector can bind to RLP4 in the absence of SOBIR1. This would help to definitively place RLP4 as the primary target.

      (6) Transcriptome analysis (Lines 130-143):

      The transcriptome analysis section feels disconnected from the rest of the manuscript. The findings, or lack thereof, from this analysis do not seem to be directly linked to the other major conclusions of the paper. This section could be removed to improve the manuscript's overall focus and flow. If the authors believe this data is critical, they should more clearly and explicitly connect the conclusions of the transcriptome analysis to the core findings about the effector-RLP4 interaction.

      (7) Signal peptide experiments (Lines 145 and beyond):

      The experiments conducted with the signal peptide (SP) are questionable. The SP is typically cleaved before the protein reaches its final destination. As such, conducting experiments with the SP attached to the protein may have produced biased observations and could lead to unjustified conclusions about the protein's function within the plant cell. We suggest the authors remove the experiments that include the signal peptide.

      (8) Overly strong conclusion and unclear evidence (Line 176):

      The use of the word "must" on line 176 is very strong and presents a definitive conclusion without sufficient evidence. The authors state that the proteins must interact with SOBIR1, but they do not provide a clear justification for this claim. Is SOBIR1 the only interaction partner for NtRLP4? The authors should provide a specific reason for focusing on SOBIR1 instead of demonstrating an interaction with NtRLP4 first. Additionally, do BtRDP or NlSP694 also interact with SOBIR1 directly? The authors should either tone down their language to reflect the evidence or provide a clearer justification for this strong claim.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein.

      Strengths:

      The authors used a wide range of methods to dissect the function of the white fly protein BtRDP and identify its host target NtRLP4.

      Weaknesses:

      (1) Serious concerns about protein work.

      I did not find the indicated protein bands for anti-BtRDP in Figures 1a and 1b in the original blot pictures shown in Figure S30. In Figure 1a, I can't get the point of showing an unspecific protein band with a size of ~190 kD as a loading control for a protein of ~ 30 kD.

      The data discrepancy led me to check other Western blot pictures. Similarly, Figures 2d, 3b, 3d, and S15b (anti-Myc) do not correspond to the original blots shown. In addition, the anti-Myc blot in Figure 4i, all blot pictures in Figures 5b, 5h, and S19a appeared to be compressed vertically. These data raised concerns about the quality of the manuscript.

      Blots shown in Figure 3d, 4f, 4g, and 4h appeared to be done at a different exposure rate compared to the complete blot shown in Figure S30. The undesirable connection between Western blot pictures shown in the figures and the original data might be due to the reduced quality of compressed figures during submission. Nevertheless, clarification will be necessary to support the strength of the data provided.

      (2) Misinterpretation of data.

      I am afraid the authors misunderstood pattern-triggered immunity through receptor-like proteins. It is true that several LRR-type RLPs constitutively associate with SOBIR1, and further recruit BAK1 or other SERKs upon ligand binding. One should not take it for granted that every RLP works this way. To test the hypothesis that NtRLP4 confers resistance to B.tabaci infestation, the author compared transcriptional profiles between an EV plant line and an RLP4 overexpression line. If I understood the methods and figure legends correctly, this was done without B. tabaci treatment. This experimental design is seriously flawed. To provide convincing genetic evidence, independent mutant lines (optionally independent overexpression lines) in combination with different treatments will be necessary. Otherwise, one can only conclude that overexpressing the RLP4 protein generated a nervous plant. In addition, ROS burst, but not H2O2 accumulation, is a common immune response in pattern-triggered immunity.

      (3) Lack of logic coherence.

      The written language needs substantial improvement. This impeded the readability of the work. More importantly, the logic throughout the manuscript appeared scattered. The choice of testing protein domains for protein-protein interactions, using plants overexpressing an insect protein to study its subcellular localization, switching back and forth between using proteins with signal peptides and without signal peptides, among others, lacks a clear explanation.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al. investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Weaknesses:

      (1) I found the naming of BtRDP and NlSP694 somewhat confusing. The authors defined BtRDP as "B. tabaci RLP-degrading protein," whereas NlSP694 appears to have been named after the last three digits of its GenBank accession number (MF278694, presumably). Is there a standard convention for naming newly identified proteins, for example, based on functional motifs or sequence characteristics? As it stands, the inconsistency makes it difficult for readers to clearly distinguish these proteins from those reported in other studies.

      (2) Figure 2 and other figures. Transgenic experiments require at least two independent lines, because results from a single line may be confounded by position effects or unintended genomic alterations, and multiple lines provide stronger evidence for reproducibility and reliability.

      (3) Figure 3e. Quantitative analysis of NtRLP4 was required. Additionally, since only one band was observed in oeRLP, were any tags included in the construct?

      (4) Figure 4a. The RNAi effect appears to be well rescued in Line 1 but poorly in Line 2. Could the authors clarify the reason for this difference?

      (5) ROS accumulation is shown for only a single leaf. A quantitative analysis of ROS accumulation across multiple samples would be necessary to support the conclusion. The same applies to Figure 16f.

      (6) Figure 4f: NtRLP4 abundance was significantly reduced in oeBtRDP plants but not in oeBtRDP-SP. Although coexpression analysis suggests that BtRDP promotes NtRLP4 degradation in an ubiquitin-dependent manner, the reduced NtRLP4 levels may not result from a direct interaction between BtRDP and NtRLP4. It is possible that BtRDP influences other factors that indirectly affect NtRLP4 abundance. The authors should discuss this possibility.

      (7) The statement in lines 335-336 that 'Overexpression of NtRLP4 or NtSOBIR1 enhances insect feeding, while silencing of either gene exerts the opposite effect' is not supported by the results shown in Figures S16-S19. The authors should revise this description to accurately reflect the data.

      (8) BtRDP is reported to attach to the salivary sheath. Does the planthopper NlSP694 exhibit a similar secretion localization (e.g., attachment to the salivary sheath)? The authors should supplement this information or discuss the potential implications of any differences in secretion localization between BtRDP and NlSP694 for their respective modes of action.

    1. eLife Assessment

      This manuscript provides a valuable contribution by identifying stress-responsive neurons in the supramammillary nucleus and their ventral subiculum inputs and assessing the regulation of anxiety-related behaviors. The evidence is convincing that the supramammillary nucleus contains stress-responsive neurons, and activation of these neurons increases anxiety-like behaviors. However, evidence that the ventral subiculum input to the supramammillary nucleus encodes and regulates anxiety and that the supramammillary nucleus generates an anxiety engram is incomplete. This work has the potential to offer new insights into how distinct circuits encode different emotional states and will be of interest to those interested in brain systems of aversive emotional and behavioral states.

    2. Reviewer #1 (Public review):

      A summary of what the authors were trying to achieve:

      Zhang et al. examine connections between supramammillary (SuM) neurons and the subiculum in the context of stress-induced anxiety-like behaviors. They identify stress-activated neurons (SANs) in the SuM using Fos2A-iCreERT2 TRAP mice and show that reactivation of SANs increases anxiety-like behavior and corticosterone levels. Circuit mapping reveals inputs from glutamatergic neurons in both ventral and dorsal subiculum (Sub) to SANs. vSub neurons showing calcium dynamics correlated with open-arm exploration in the elevated zero maze (EZM), which is interpreted to indicate a link to e. Finally, chronic inhibition of vSub→SuM neurons during chronic social defeat stress (CSDS) reduces anxiety-like behaviors.

      An account of the major strengths and weaknesses of the methods and results:

      Strengths:

      The manuscript provides compelling evidence for monosynaptic connections from the subiculum to SuM neurons activated by stress. Demonstrating that SuM neuronal activity is altered after CSDS is of particular interest, potentially linking SuM circuits to stress-related psychiatric disorders. The TRAP approach highlights a stress-responsive population of neurons, and reactivation studies suggest behavioral relevance. Together, these data contribute to an emerging literature implicating SuM in stress and anxiety regulation.

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.<br /> An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination. Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported(Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      The reported results align with prior studies on SuM and Sub areas' roles in stress in anxiety. There are limitations due to narrowly focused behavioral assays and the limited temporal resolution of the tools used. Overall, the study further supports a role for SuM in threat and stress responses. The reported changes in SuM neuron activity following chronic stress may offer new insights into stress-induced disorders and behavioral changes.

    3. Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to investigate the mechanisms of anxiety. The paper focuses on the supramammillary nucleus (SuM) based on a fos screen and recordings showing that footshock and social defeat stress increase activity in this region. Using activity-dependent tagging, they show that reactivation of stress-activated neurons in SuM has an anxiety-like effect, reducing open-arm exploration in the elevated zero task. They then investigate the ventral subiculum as a potential source of anxiety-related information for SuM. They show that ventral subiculum (vSub) inputs to SuM are more strongly activated than dSub when mice explore the open arms of the elevated zero. Finally, they show that DREADD-mediated inhibition of vSub-SuM projections alleviates stress-enhanced anxiety. Overall, the results provide good evidence that SuM contains a stress-activated neuronal population whose later activity increases anxiety-like behavior. It further provides evidence that vSub projects to SuM are activated by stress, and their inhibition alleviates some effects of stress.

      Strengths:

      Strengths of this paper include the use of convergent methods (e.g., fos plus electrode recordings, footshock, and social defeat) to demonstrate that the SuM is activated by different forms of stress. The activity-dependent tagging experiment shows that footshock-activated SuM neurons are reactivated by social defeat but not by sucrose is also compelling because it provides evidence that SuM neurons are driven by some integrative aspect of stress rather than by a simple sensory stimulus.

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

    1. eLife Assessment

      This valuable study characterises receptors for calcitonin-related peptides from a deuterostomian animal, the echinoderm Apostichopus japonicus, by a combination of heterologous expression, pharmacological experiments, and the quantification of gene-expression levels. The authors provide convincing evidence for a functional calcitonin-related peptide system in the sea cucumber, but further work will be needed to confirm the proposed physiological functions of PDF receptor system in this species. This work should be of interest to scientists studying the signaling pathways, functions, and evolution of neuropeptides, and could be of relevance to improving the culture conditions of this economically key species.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors clusters within the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, immunohistochemistry, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, revealing broad and in some cases receptor-specific effects on cAMP, calcium, and ERK signaling.

      Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection and knockdown experiments uncover a growth-promoting role for one CT-like peptide (AjCT2). Injection of AjCT2 at high concentration, or long-term knockdown of the AjCT precursor, affects diverse growth-related parameters including weight gain rate, specific growth rate, and transcript levels of growth-regulating transcription factors. The authors also reveal a growth-promoting function for the PDFR-like receptor AjPDFR2, suggesting that this receptor mediates the effects of AjCT2 on growth.

      Weaknesses:

      Expression of CT-like peptides was investigated both at transcript and protein level, but insight into the expression of the three peptide receptors is limited. This makes it difficult to understand the mechanism underlying the (different) functions of the two CT-like peptides in vivo. The authors identify differences in signal transduction cascades activated by each peptide, which might underpin distinct functions, but these differences were established only in heterologous cells.

      The authors show overlapping phenotypes for a long-term knockdown of the AjCT precursor and the AjPDFR2 receptor, suggesting that the growth-regulating functions of AjCT2 are mediated by this receptor pathway. However, it remains unclear whether this mechanism underpins the growth-regulating function of AjCT2, until further in vivo evidence for this ligand-receptor interaction is presented. For example, the authors could investigate whether knockdown of AjPDFR2 attenuates the effects of AjCT2 peptide injection. In addition, a functional PDF system in this species remains uncharacterized, and a potential role of PDF-like peptides in growth regulation has not yet been investigated in A. japonicus. Therefore, it also remains unclear whether the ability of CT-like peptides to activate PDFRs is an evolutionary ancient property of this peptide family or whether this is an example of convergent evolution in some protostomian (Drosophila) and deuterostomian (sea cucumber) species.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor, but they also activate the two "PDF receptors", ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and "PDF receptors" is weak.

      Strengths:

      This is the first study to report pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript characterizes a functional peptidergic system in the echinoderm Apostichopus japonicus that is related to the widely conserved family of calcitonin/diuretic hormone 31 (CT/DH31) peptides in bilaterian animals. In vitro analysis of receptor-ligand interactions, using multiple receptor activation assays, identifies three cognate receptors for two CT-like peptides in the sea cucumber, which stimulate cAMP, calcium, and ERK signaling. Only one of these receptors clusters within the family of calcitonin and calcitonin-like receptors (CTR/CLR) in bilaterian animals, whereas two other receptors cluster with invertebrate pigment dispersing factor receptors (PDFRs). In addition, this study sheds light on the expression and in vivo functions of CT-like peptides in A. japonicus, by quantitative real-time PCR, immunohistochemistry, pharmacological experiments on body wall muscle and intestine preparations, and peptide injection and RNAi knockdown experiments. This reveals a conserved function of CT-like peptides as muscle relaxants and growth regulators in A. japonicus.

      Strengths:

      This work combines both in vitro and in vivo functional assays to identify a CT-like peptidergic system in an economically relevant echinoderm species, the sea cucumber A. japonicus. A major strength of the study is that it identifies three G protein-coupled receptors for AjCT-like peptides, one related to the CTR/CLR family and two related to the PDFR family. A similar finding was previously reported for the CT-related peptide DH31 in Drosophila melanogaster that activates both CT-type and PDF-type receptors. Here, the authors expand this observation to a deuterostomian animal, which suggests that receptor promiscuity is a more general feature of the CT/DH31 peptide family and that CT/DH31-like peptides may activate both CT-type and PDF-type receptors in other animals as well.

      Besides the identification of receptor-ligand pairs, the downstream signaling pathways of AjCT receptors have been characterized, revealing broad and in some cases receptor-specific effects on cAMP, calcium, and ERK signaling.

      Functional characterization of the CT-related peptide system in heterologous cells is complemented with ex vivo and in vivo experiments. First, peptide injection and RNAi knockdown experiments establish transcriptional regulation of all three identified receptors in response to changing AjCT peptide levels. Second, ex vivo experiments reveal a conserved role for the two CT-like peptides as muscle relaxants, which have differential effects on body wall muscle and intestine preparations. Finally, peptide injection and knockdown experiments uncover a growth-promoting role for one CT-like peptide (AjCT2). Injection of AjCT2 at high concentration, or long-term knockdown of the AjCT precursor, affects diverse growth-related parameters including weight gain rate, specific growth rate, and transcript levels of growth-regulating transcription factors. The authors also reveal a growth-promoting function for the PDFR-like receptor AjPDFR2, suggesting that this receptor mediates the effects of AjCT2 on growth.

      Weaknesses:

      The authors present a more detailed phylogenetic analysis in the revised version, including a larger number of species. But some clusters in the analysis are not well supported because they have only low bootstrap values. This makes it difficult to interpret the clustering in some parts of the tree.

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterised in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterised previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterised in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Expression of CT-like peptides was investigated both at transcript and protein level, but insight into the expression of the three peptide receptors is limited. This makes it difficult to understand the mechanism underlying the (different) functions of the two CT-like peptides in vivo. The authors identify differences in signal transduction cascades activated by each peptide, which might underpin distinct functions, but these differences were established only in heterologous cells.

      We appreciate the reviewer's insightful comments. Regarding expression of CT-like peptide receptors, we have quantitatively analyzed the mRNA expression levels of the three receptors in key tissues using qRT-PCR (Figure 6, line 319) and receptor expression exhibits significant tissue-specific differences. Combined with the heterologous expression assays and In vivo functional validation, we believe our findings have provided clear mechanistic insights into the functional divergence of the two CT-like peptides. Investigation of the expression of the three receptor proteins in A. japonicus would require generation of specific antibodies, which was beyond the scope of this study. Furthermore, immunohistochemical visualization of neuropeptide receptor expression in other invertebrates has not been reported widely, which likely reflects technical difficulties in generation of antibodies that can be used to specifically detect receptor proteins that are typically expressed a low level in comparison to the neuropeptides that act as their ligands. 

      We acknowledge that investigating signal transduction cascades in heterologous cells (rather than native A. japonicus cells) is a limitation. However, as a non-model organism, A. japonicus currently lacks established cell lines for such research. Therefore, using heterologous cells was the most feasible approach to examine the differential signaling cascades activated by the peptides through the three receptors. Importantly, our in vivo experiments demonstrated that long-term knockdown of either the AjCT precursor or AjPDFR2 resulted in similar and significant growth defects. The phenotypic consistency strongly suggests that AjCT2 and AjPDFR2 function within the same signaling pathway, with AjPDFR2 serving as the key receptor functionally activated by AjCT2.

      The authors show overlapping phenotypes for a long-term knockdown of the AjCT precursor and the AjPDFR2 receptor, suggesting that the growth-regulating functions of AjCT2 are mediated by this receptor pathway. However, it remains unclear whether this mechanism underpins the growth-regulating function of AjCT2, until further in vivo evidence for this ligand-receptor interaction is presented. For example, the authors could investigate whether knockdown of AjPDFR2 attenuates the effects of AjCT2 peptide injection. In addition, a functional PDF system in this species remains uncharacterized, and a potential role of PDF-like peptides in growth regulation has not yet been investigated in A. japonicus. Therefore, it also remains unclear whether the ability of CT-like peptides to activate PDFRs is an evolutionary ancient property of this peptide family or whether this is an example of convergent evolution in some protostomian (Drosophila) and deuterostomian (sea cucumber) species.

      Thank you for the reviewer’s insightful comments and constructive questions. We acknowledge the request for more direct evidence to demonstrate how AjCT2 functions in vivo through AjPDFR2. However, long-term knockdown of the AjCT precursor and AjPDFR2 both resulted in identical and significant growth defect phenotypes. The high phenotypic consistency, combined with the activation effect of AjCT2 on AjPDFR2 in heterologous cells, strongly suggests that they function within the same signaling pathway, with AjPDFR2 serving as the key receptor functionally activated by AjCT2. While exogenous peptide injection combined with receptor knockdown is a classic method for verifying receptor activation, phenotypic overlap itself is widely accepted in genetic research as robust evidence for pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017). A. japonicus is a non-model organism with a 3-month aestivation period in summer followed shortly by winter hibernation. During these periods, we are unable to conduct in vivo experiments. Any single experimental suggestion from reviewers could potentially require one more year of research and we have already conducted an additional year of research, in response to reviewer feedback, since submitting the original manuscript. We hope therefore that these challenges associated with working with aquatic invertebrate non-model organisms is recognized by the reviewers.

      We fully agree that the functional PDF/PDFR system in A. japonicus and its potential role in growth regulation remain uncharacterized. Currently, the precursors of the PDF-type neuropeptide in echinoderms remain unidentified, which precludes clear pharmacological characterization of the two receptors. While further exploration of echinoderm PDF-type neuropeptides is still needed, our phylogenetic analysis-conducted using the maximum likelihood method with optimized parameters and rigorous sequence curation-demonstrates that the deuterostomian PDFRs (including AjPDFR1 and AjPDFR2) are positioned in a clade with the well-characterized protostomian PDFR clades with extremely high bootstrap support (value=100). Therefore, these two receptors in A. japonicus clearly belong to the PDF receptor family and our findings clearly indicate that the ability of CT-like peptides to activate PDFRs is either an evolutionarily ancient and conserved property or has arisen independently in different lineages. Details of methods employed to produce the new receptor tree are included in line 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

      Reviewer #2 (Public review):

      Summary:

      The authors show that A. japonicus calcitonins (AjCT1 and AjCT2) activate not only the calcitonin/calcitonin-like receptor, but they also activate the two "PDF receptors", ex vivo. They also explore secondary messenger pathways that are recruited following receptor activation. They determine the source of CT1 and CT2 using qPCR and in situ hybridization and finally test the effects of these peptides on tissue contractions, feeding and growth. This study provides solid evidence that CT1 and CT2 act as ligands for calcitonin receptors; however, evidence supporting cross-talk between CT peptides and "PDF receptors" is weak.

      Strengths:

      This is the first study to report pharmacological characterization of CT receptors in an echinoderm. Multiple lines of evidence in cell culture (receptor internalization and secondary messenger pathways) support this conclusion.

      Weaknesses:

      The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionary ancient since similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such low support, it is unclear if the clade comprising deuterostomian "PDFR" is in fact PDFRs and not another receptor type whose endogenous ligand (besides CT) remains to be discovered.

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterized in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterized previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterized in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739. The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      Reviewer #2 (Recommendations for the authors):

      Figure 1C: The bootstrap support is quite low in a lot of instances, especially for the deuterostomian and protostomian PDFR clades which is below 30. With such support, I would be hesitant to label the blue clade as deuterostomian PDFR for two reasons: 1) no members of this clade have been shown to be activated by a PDF-like substance and 2) the current study shows that these receptors are activated by CT-type peptides. Therefore, the phylogenetic analyses do not support the conclusions of this paper. What is the basis for calling these receptors PDFR and not CTR in light of weak phylogenetic support?

      Thank you for the reviewer’s comments. In response, we have produced a new phylogenetic analysis using the maximum likelihood method. This was done by Nayeli Escudero Castelán and Kite Jones in the Elphick group at QMUL and therefore they have been added as co-authors of this paper. The new phylogenetic tree (Figure 2, line 206) includes broad taxonomic sampling of CT-type receptors and PDF-type receptors. CRH-type receptors, which are also members of the secretin-type GPCR sub-family, have been included as an outgroup to root the tree. In the previous version the much more distantly related vasopressin/oxytocin-type receptors, which are rhodopsin-type GPCRs, were included as an outgroup. Furthermore, VIP-type receptors were also included in the previous tree but these have been omitted from the new tree because VIP receptor orthologs only occur in vertebrates and therefore they are not representative of a bilaterian GPCR family. The new tree shows high bootstrap support for key clades, notably achieving a bootstrap value of 100 for a clade comprising both deuterostomian and protostomian PDF receptors. This provides important evidence that the A. japonicus PDF-type receptors characterized in this study (AjPDFR1, AjPDFR2) are co-orthologs of the PDF-type receptor that has been characterized previously in Drosophila. Similarly, there is strong bootstrap support (100) for a clade comprising CT/DH31-type receptors and, importantly, the CT-type receptor characterized in this study (AjCTR) is positioned in a branch of this clade that comprises deuterostomian CT-type receptors (with bootstrap support of 100). Details of methods employed to produce the new receptor tree are included in lines 727-739 The new phylogenetic tree is shown below and has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183).

      We agree with the reviewer that no members of the PDF-type receptor clade in deuterostomes have yet been shown to be activated by a PDF-like substance. That is because the precursors of the PDF-type neuropeptides in echinoderms remain unidentified so far, which precludes clear pharmacological characterization of these receptors within the deuterostomian PDFR clade. However, the new phylogenetic tree now provides strong support (bootstrap value = 100) for the clade comprising deuterostomian and protostomian PDFRs, confirming the classification of AjPDFR1 and AjPDFR2 as PDF-type receptors. 

      References:

      Bauknecht P, Jékely G. Large-Scale Combinatorial Deorphanization of Platynereis Neuropeptide GPCRs. Cell reports, 2015, 12(4), 684–693. doi:  10.1016/j.celrep.2015.06.052.

      Beets I, Zels S, Vandewyer E, Demeulemeester J, et al. System-wide mapping of peptide-GPCR interactions in C. elegans. Cell reports, 2023, 42(9), 113058. doi: 10.1016/j.celrep.2023.113058.

      Cardoso J C, Mc Shane J C, Li Z, et al. Revisiting the evolution of Family B1 GPCRs and ligands: Insights from mollusca. Molecular and cellular endocrinology, 2024, 586, 112192. doi: 10.1016/j.mce.2024.112192.

      Gorn A H, Lin H Y, Yamin M, et al. Cloning, characterization, and expression of a human calcitonin receptor from an ovarian carcinoma cell line. The Journal of clinical investigation, 1992, 90(5), 1726–1735. doi: 10.1172/JCI116046.

      Huang T, Su J, Wang X, et al. Functional Analysis and Tissue-Specific Expression of Calcitonin and CGRP with RAMP-Modulated Receptors CTR and CLR in Chickens. Animals: an open access journal from MDPI, 2024, 14(7), 1058. doi: 10.3390/ani14071058.

      Johnson E C, Shafer O T, Trigg J S, et al. A novel diuretic hormone receptor in Drosophila: evidence for conservation of CGRP signaling. Journal of Experimental Biology, 2005, 208(7): 1239-1246. doi: 10.1242/jeb.01529.

      McLatchie L M, Fraser N J, Main M J, et al. RAMPs regulate the transport and ligand specificity of the calcitonin-receptor-like receptor. Nature, 1998, 393(6683): 333-339. doi: 10.1038/30666.

      Schwartz J, Réalis-Doyelle E, Dubos M P, et al. Characterization of an evolutionarily conserved calcitonin signaling system in a lophotrochozoan, the Pacific oyster (Crassostrea gigas). Journal of Experimental Biology, 2019, 222(13): jeb201319. doi: 10.1242/jeb.201319.

      Sekiguchi T, Kuwasako K, Ogasawara M, et al. Evidence for conservation of the calcitonin superfamily and activity-regulating mechanisms in the basal chordate Branchiostoma floridae: insights into the molecular and functional evolution in chordates. Journal of Biological Chemistry, 2016, 291(5): 2345-2356. doi: 10.1074/jbc.M115.664003.

      The new results following AjCT and AjPDFR2 knockdown are a welcome addition. While this additional evidence supports the claim that AjCT could mediate its effects via AjPDFR2, this evidence does not show that AjCT acts as an endogenous ligand for PDFR in vivo. In combination with the weak phylogenetic analyses, I would recommend the authors to key down their claims that they have functionally characterized a PDFR (in the title and text).

      Thank you for your insightful comments and we do understand the reviewer’s concern. 

      Regarding “the weak phylogenetic analyses”, as highlighted above, we have produced a new phylogenetic tree (Fig 2, line 206) that provides strong bootstrap support for the clade comprising deuterostome and protostome PDF-type receptors. For this reason, it is our opinion that inclusion of “pigment-dispersing factor-type receptors” in the title of the paper is appropriate. The details of phylogenetic analysis method were added in line 727-739, and the updated phylogenetic tree has been incorporated into the revised manuscript (Figure 2, line 206). The description of new phylogenetic tree has also been modified accordingly in the revised manuscript (line 169-183). Besides, long-term knockdown of the AjCT precursor and AjPDFR2 both resulted in identical and significant growth defect phenotypes. And the observation of phenotypic overlap is widely accepted in genetic research as strong evidence for pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017). This high degree of phenotypic consistency, coupled with our in vitro finding that AjCT2 specifically activates AjPDFR2, strongly supports the conclusion that AjCT2 and AjPDFR2 function within the same signaling pathway in vivo, with AjPDFR2 serving as the key receptor functionally activated by AjCT2.

      References:

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

      Since there is no formal logic defining the use of "type" vs "like" vs "related", I would encourage the authors to use one term (of their choice) to avoid unnecessary confusion. Or another possibility is that these relationships are defined at some point in the manuscript so that it becomes clear to the reader.

      Thank you for the reviewer’s comments. The “CT-related peptides” has defined in the Introduction (line 54-58). As per your suggestion, we have now defined both “CT-type peptides” and “CT-like peptides” in the Introduction (line 76-79). “CT-type peptides” are characterized by an N-terminal disulphide bridge, whereas “CT-like peptides” (diuretic hormone 31 (DH31)-type peptides) lack this feature. Additionally, in accordance with the definitions, we have corrected these three descriptions in the revised manuscript (line 80, 83, 88 for “CT-type peptides”) to ensure consistent and accurate usage of these terms.

      "To provide in vivo evidence supporting CT-mediated activation of "PDF" receptors, we conducted the following experiments: Firstly, we confirmed that AjPDFR1 and AjPDFR2were the functional receptors of AjCT1and AjCT2 (Figure 2, 3 and 4). Secondly, injection of AjCT2 and siAjCTP1/2-1 in vivo induced corresponding changes in AjPDFR1and AjPDFR2expression levels in the intestine (Figure 8C, 9A, 9B and 9C)."

      None of these experiments provide direct evidence that CT activates PDFR in vivo. The functional studies are indeed a welcome addition but they cannot discriminate between correlation and causation.

      Thank you for the reviewer’s insightful comments. We agree that the functional studies do not constitute direct proof that CT’s activation of PDFR in vivo. However, we observed identical and significant growth defect phenotypes following long-term knockdown of the AjCT precursor and the AjPDFR2. This high degree of phenotypic congruence, combined with the established in vitro activation of AjPDFR2 by AjCT2, provides strong support for the conclusion that AjCT2 acts as the key endogenous ligand activating the AjPDFR2 signaling pathway in vivo. Importantly, such phenotypic overlap has been widely accepted in genetic research as strong evidence for functional pathway association (Shafer and Taghert, 2009; Van Sinay et al., 2017).

      References:

      Shafer, O. T., & Taghert, P. H. (2009). RNA-interference knockdown of Drosophila pigment dispersing factor in neuronal subsets: the anatomical basis of a neuropeptide's circadian functions. PloS one, 4(12), e8298. doi: 10.1371/journal.pone.0008298.

      Van Sinay, E., Mirabeau, O., Depuydt, G., Van Hiel, M. B., Peymen, K., Watteyne, J., Zels, S., Schoofs, L., & Beets, I. (2017). Evolutionarily conserved TRH neuropeptide pathway regulates growth in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America, 114(20), E4065–E4074. doi: 10.1073/pnas.1617392114.

    1. eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete. It could be strengthened by the use of sensitive RNA in situ hybridization approaches.

    2. Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche. This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells.

      (3) Appropriate use of quantification and statistics.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional characterization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

    3. Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wild-type cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      c) Additionally, bam⁺/⁻ GSCs (the first bar in Figure 4E) should appear GFP⁺ and Red⁺ (i.e., yellow). It would be helpful if the authors could indicate these bam⁺/⁻ germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam⁺/⁻ cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp⁺/⁻; bam⁺/⁻] SGCs and [dpp⁺/⁻; bam⁺/⁻] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

    4. Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here.

      Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche". Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like single-germ-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution.

      In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpp-lacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's cofactor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on nonmutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche. 

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs nontumorous cells.

      (3) Appropriate use of quantification and statistics.

      We greatly appreciate these comments.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      This is a great question. Because the SGC phenotype depends on the presence of germline tumor clones, our quantification was restricted to germaria that contained them.These quantification data ("SGCs and/or germline cysts per germarium with germline clones") will be presented in the revised Figure 1.

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      Our initial attempts to induce ovarian hs-flp germline clones by heat-shocking adult flies were unsuccessful, with very few clones being observed. Therefore, we shifted our approach to an earlier developmental stage. Successful induction was achieved by subjecting late-L3/early-pupal animals to a twice-daily heatshock at 37°C for 6 consecutive days (2 hours per session with a 6-hour interval, see Lines 325-329) (Zhao et al., 2018).

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional characterization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      These 20-25% of SGCs are bamP-GFP<sup>+</sup> dad-lacZ-, not bam<sup>+</sup> dad-lacZ<sup>+</sup> (see Figure 2C and 3D). They would be cystoblast-like cells that may have initiated a differentiation program toward forming germline cysts (see Lines 109-117). The 70-75% of SGCs that have low BMP signaling exhibit GSC-like properties, including: 1) dot-like spectrosomes; 2) dad-lacZ positivity; 3) absence of bamP-GFP expression. While additional markers would be beneficial, we think that this combination of properties is sufficient to classify these cells as GSC-like. 

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      Yes, we initially identified the SGC phenotype through hs-flp-mediated mosaic analysis of bam or bgcn mutant in ovaries. However, as noted in our response to Weakness (2), this approach was very labor-intensive. Therefore, we switched to using the more convenient nos::flp system for subsequent experiments. To our observation, there was no difference in the SGC phenotype between these two approaches, confirming that the nos::flp system is a valid and more practical alternative for its study. 

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      These are very good questions. Such time-course analysis data will be provided in revised Figure 1. The SGC phenotype depends on the presence of bam or bgcn mutant germline clones. Germaria from 14-day-old flies contained bigger and more such clones than those from younger flies. This age-dependent increase in clone size and frequency significantly enhanced the efficiency of our quantification (see Lines 129-131). 

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      We greatly appreciate this comment. It was also difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. In Figure 4A-F, to resolve this problem, we used a triplecolor system, in which red germ cells (RFP<sup>+/+</sup> GFP<sup>-/-</sup>) are bam mutant, yellow germ cells (RFP<sup>+/-</sup> GFP<sup>+/-</sup>) are wild-type, and green germ cells (RFP<sup>-/-</sup> GFP<sup>+/+</sup>) are punt or med mutant. In Figure 4G-J, we quantified the SGC phenotype only in black germ cells (GFP<sup>-/-</sup>), which are wild-type (control) or mad mutant.  In Figure 6, we quantified the SGC phenotype only in green germ cells (both GFP<sup>+/+</sup> and GFP<sup>+/-</sup>), all of which are wild-type.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpplacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      We appreciate this critical comment. The settings of immunofluorescent staining and confocal parameters in Figure 5A were the same as those in 5B. To our observation, the level of dpp-lacZ in cap cells was variable across germaria, even within the same ovary, as quantified in Figure 5C. We will provide RNA in situ hybridization data to further strengthen the conclusion that bam or bgcn mutant germline tumors secret BMP ligands.  

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      No. Given that bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in inducing the SGC phenotype (compare Figure 6F, I with Figure 6-figure supplement 3C), we believe that repeating these experiments with bam<sup>Δ86</sup> would be redundant and would not alter the key conclusion of our study. Thanks for the understanding!

      Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      We apologize for any confusion. In our manuscript, the term "SGC" is reserved specifically for wild-type germ cells that maintain a GSC-like morphology outside the niche. bam or bgcn mutant germ cells are referred to as GSC-like tumor cells (Lines 87-88), not SGCs.

      (a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wildtype cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      The SGCs analyzed in Figure 2A-C are wild-type, GSC-like cells located outside the niche. They were generated using the same genetic strategy depicted in Figures 1C and 1E (with the schematic in Figure 1B). The complete genotypes for all experiments are available in Source data 1. 

      (b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      The strategy to generate SGCs in Figure 4B-F (with the schematic in Figure 4A) is completely different from that in Figure 1C-F, H, and I (with the schematic in Figure 1B). In Figure 4B-F, we needed to distinguish punt<sup>-/-</sup> (or med<sup>-/-</sup>) with punt<sup>+/-</sup> (or med<sup>+/-</sup>) germ cells. As noted in our response to Reviewer #1’s Weakness (6), it was difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. Therefore, we chose to use the triple-color system to distinguish these germ cells in Figure 4B-F (see genotypes in Source data 1). 

      (c) Additionally, bam⁺/⁻ GSCs (the first bar in Figure 4E) should appear GFP⁺ and Red⁺ (i.e., yellow). It would be helpful if the authors could indicate these bam⁺/⁻ germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam⁺/⁻ cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      The color-to-genotype relationships for the schematics in Figures 2A and 4E are provided in Figures 1B and 4A, respectively. Due to the high density of germ cells, it is impractical to label each genotype directly in the images. In contrast to Figure 4E, the colors in Figure 4F do not represent genotypes; instead, blue denotes the percentage of SGCs, and red denotes the percentage of germline cysts, as indicated below the bar chart. 

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      Thanks for this constructive suggestion. We will include such quantification data in the revised manuscript.

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      Unlike wild-type cystocytes, which undergo incomplete cytokinesis and lack midbodies, those with USP8 loss undergo complete cell division, with the presence of midbodies (white arrow, Figure 1F’ from Mathieu et al., 2022) as a marker of the late cytokinesis stage (Mathieu et al., 2022). 

      (a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      (b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      We greatly appreciate this critical comment. bam mutant mosaic germaria indeed contained wild-type germline cysts, as evidenced by an SGC frequency of ~70%, rather than 100% (see Figures 2H, 4F, 4J, 6F, 6I, and Figure 6-figure supplement 3C). Since the SGC phenotype depends on the presence of bam or bgcn mutant germline tumors, we quantified it as “the percentage of SGCs relative to the total number of SGCs and germline cysts that are surrounded by germline tumors” (see Lines 124-129). Quantifying the SGC phenotype as "the percentage of germaria with SGCs" would be imprecise. This is because the presence and number of SGCs were highly variable among germaria with bam mutant germline clones, and a small number of germaria entirely lacked these clones. We will provide the data of "SGCs and/or germline cysts per germarium with germline clones" in revised Figure 1.

      (c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      Such representative germaria are shown in Figure 2G, 3B, 3C, 6D, 6E, and 6H. The percentage of germline cysts can be calculated by “100% - SGC%”.

      (d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      As noted in our response to points a) and b) above, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for analyzing the phenotype. For this experiment, we examined >50 such germline cysts via confocal microscopy. As the analysis was performed on a defined cellular population, this sample size should be sufficient to support our conclusion. 

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      We assayed for the presence of midbodies or not specifically within the germline cysts surrounded by bam mutant tumors, not within the tumors themselves (Lines 94-95). As detailed in Lines 88-97, the absence of midbodies was used as a key criterion to exclude the possibility of dedifferentiation.  

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      We greatly appreciate this comment. RNA in situ hybridization data will be provided to further strengthen the conclusion that bam or bgcn mutant germline tumors secret BMP ligands.

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp⁺/⁻; bam⁺/⁻] SGCs and [dpp⁺/⁻; bam⁺/⁻] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      Schematic cartoons in Figure 6A and 6B demonstrate that these analyses were performed in a dpp<sup>+/-</sup> background. Figure 6-figure supplement 1 indicates that dpp<sup>+/-</sup> or gbb<sup>+/-</sup> does not affect GSC maintenance, germ cell differentiation, and female fly fertility. Figure 6C is the control for 6D and 6E, and 6G is the control for 6H, with quantification in 6F and 6I.  We used nos::flp, not the heat shock method, to induce germline clones in these experiments (see genotypes in Source data 1).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      Thanks for teaching us! Such discussion will be included in the revised manuscript.

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      SGC is readily distinguishable from multi-cellular germline cyst based on morphology. In some clonal analysis experiments, fusome staining was not feasible due to technical limitations such as channel saturation or antibody incompatibility. Thanks for the understanding! 

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      The figure panels were arranged to optimize space while ensuring that related panels are grouped in close proximity for logical comparison. We would be happy to consider any specific suggestions for an alternative layout that could improve clarity. Thanks!

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

      Thanks for this constructive suggestion. Such information will be included in figure legends in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      We greatly appreciate these comments.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here. Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche". Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like singlegerm-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      Thanks for this critical comment. Such time-course analysis data will be provided in revised Figure 1.

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution. In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpplacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      We greatly appreciate this critical comment. In our data, the expression of dpp-lacZ in cap cells was variable across germaria, even within the same ovary, as quantified in Figure 5C. The images in Figures 5A and 5B were selected as representative examples of positive signaling. To directly address the reviewer's point and strengthen our conclusion, we will perform RNA in situ hybridization data in the revised manuscript to visualize the expression of BMP ligands within the bam or bgcn mutant germline tumor cells.

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      Although a structure resembling a "branched fusome" is visible in Figure 6E (right of the white arrowhead), it is an artifact resulting from the cytoplasm of GFP-positive follicle cells, which also stain for α-Spectrin, projecting between germ cells of different clones (see the merged image). In both our previous (Zhang et al., 2023) and current studies, bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in its ability to block GSC differentiation and induce the SGC phenotype (compare Figure 6F, I with Figure 6-figure supplement 3C). Given this, we believe that repeating the extensive experiments in Figure 6 with the bam<sup>Δ86</sup> allele would be scientifically redundant and would not change the key conclusion of our study. We thank the reviewer for their consideration.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      Dpp and Gbb are the key niche signals from cap cells for maintaining GSC stemness. Our work demonstrates that germline tumors can specifically mimic this signaling function, not the full suite of cap cell properties, to create a non-cell-autonomous differentiation block. The current title “Tumors mimic the niche to inhibit neighboring stem cell differentiation” reflects this precise concept: a partial, functional mimicry of the niche's most relevant activity in this context. We feel it is an appropriate and compelling summary of our main conclusion.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

      Thanks for this suggestion. Such information will be included in the revised manuscript.

    1. eLife Assessment

      This manuscript presents significant and important work that advances single-molecule imaging technology of transcription with simultaneous analysis of several parameters. However, currently, the evidence is incomplete and requires further quantitation/description of the technologies used, further controls, and additional analysis of the data by other methods.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the effects of transcriptional activation on chromatin dynamics and mobility. Using a breast cancer model, the authors examine the effects of estrogen receptor-a (ERa) stimulation and the resulting transcriptional activation on chromatin behavior at ERa-dependent loci during three distinct phases: unstimulated, acute stimulation, and chronic stimulation. Through live DNA and RNA imaging, the authors claim that ERa-dependent target genes display distinct bursting dynamics during periods of acute versus chronic simulation, accompanied by an overall increase in chromatin mobility. Notably, they claim that ERa-dependent loci display increased mobility during the non-bursting phase compared to the bursting phase. The study also attempts to explore the role of condensates in mediating these transcriptional and chromatin mobility changes using a single-molecule tracking assay to identify a unique population of low diffusion-coefficient molecules that appears upon E2 stimulation and is sensitive to 1,6-hexanediol.

      Strengths:

      While the study develops interesting tools that have the potential to provide useful insights into the relationship between transcriptional state, genomic locus mobility, and condensate formation, several major claims lack key supportive evidence, and the methods are inadequately established and described.

      Weaknesses:

      (1) The use of 1,6 hexanediol experiments is not suitable for drawing conclusions in live cell experiments, as this assay is now widely recognized to be plagued with artifacts and inadequate as a test for condensate formation. 1,6 hexanediol perturbs all hydrophobic interactions and has effects ranging from perturbing kinase and phosphatase activities (Düster et al, J. Biol. Chem., 2021), immobilizing and condensing chromatin in living cells (Itoh et al., Life Sci. Alliance 2021), disrupting nuclear pore complexes (Ribbeck et al., EMBO 2002), nuclear transport (Barrientos et al., Nucleus, 2023), and does not disrupt charge-mediated phase separation (Zheng et al., EMBO, 2025). There is also a discussion on these effects in a recent article: Current practices in the study of biomolecular condensates: a community comment, Alberti, Nat. Comm., 2025.

      (2) The chromatin mobility is analyzed using displacement, and the differences are typically less than 50 nm. There is no discussion on the precision of this measurement and what these small differences may mean. No control loci are assessed to see if this effect is specific to the genes of interest or global.

      (3) The SMT analysis is performed using Mean Square Displacement fitting of short single trajectories, which is error-prone, and no analysis is performed on the localization precision or error in estimation of the key parameters. Potential artifacts from this analysis are reflected in the distribution of alpha and diffusion coefficients that are presented in this paper, which include physically impossible values on which major claims rest.

      (4) No experiment is performed to directly connect foci/cluster/condensation formation of ER at the genes of interest. Given these points alone, it is impossible to assess whether any of the claims made in the current manuscript are correct.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a combination of state-of-the-art live-cell imaging techniques to track transcriptional bursting, DNA mobility, and single-molecule tracking to discern biophysical behaviours of chromatin and condensate formation in response to ER𝛼 activation. Surprisingly, the authors find that loci in estradiol-stimulated cells display enhanced mobility during the non-bursting phase. The authors attribute the reduced mobility of the loci during transcriptional bursts to condensate formation of ER𝛼 on enhancers regulating the bursting gene. Inhibition of transcription with flavopiridol shifts the loci and ER𝛼 to a non-confined state. These findings open the door to performing more complex multi-color live-cell imaging assays to fully interrogate the role of transcription factor condensates, DNA mobility, and subnuclear localization in the regulation of transcriptional bursting kinetics, and should be of great benefit to researchers studying mechanisms of gene regulation.

      Strengths:

      The authors presented a series of advanced multi-color live cell imaging assays used to correlate changes in DNA mobility with transcriptional bursting of a gene. By using such a defined temporal trigger associated with the addition of estroldiol to cells, the authors were also able to elegantly characterize changes in the diffusive properties of different classes of ER𝛼 during the acute (early, <2 hours) and chronic (late, >2 hours) phases of estrogen-responsive gene activation. Interestingly, one particular class of ER𝛼 that changed between acute and chronic phases was also responsive to 1,6-hexanediol treatment, suggesting that the authors are assaying ER𝛼 behaviours related to condensate formation. The authors also examined how the proximity of the NRIP1 gene to interchromatin granules impacted transcriptional bursting kinetics. There was no correlation of DNA mobility nor transcription bursting associated with localization to interchromatin granules, suggesting that other higher-order, architectural associations are regulating these processes. The imaging data were also supported by genomic GRO-seq and ChIP-seq assays showing changes in genomic occupancy of a number of transcription factors, including ER𝛼, during the pre-acute, acute, and chronic phases.

      Weaknesses:

      Although there are a number of compelling strengths to support the author's interpretation of the data, the paper is written in a way that lacks clarity and detail on a number of technical components. This lack of details, in particular related to how endogenous tagging of DNA, ER𝛼, and interchromatin granules (e.g. SC35) potentially impacts transcriptional bursting, makes it difficult for the reader to sufficiently judge any potential limitations of these complex engineered cell lines. Another potential weakness is the lack of any experiments directly measuring ER𝛼 diffusive properties in close proximity to the bursting gene. It is noted that this type of experiment examining transcription factor binding on a bursting gene is very technically challenging, given the different timescales of measurement of bursting (seconds-minutes) versus ER𝛼 diffusion (sub-seconds). However, these types of experiments would go a long way to supporting the authors' conclusions regarding how changes in DNA mobility and transcription bursting may be directly related to ER𝛼 condensate formation on enhancers.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors explore dynamic chromosomal mobility and transcriptional bursting events in mammalian cells, particularly focusing on ERα-dependent gene activation. The authors investigate how the physical movement of DNA loci changes during different phases of gene transcription (bursting vs. non-bursting, acute vs. chronic stimulation). Using advanced live-cell imaging techniques, including SMT of ERα and dual DNA/RNA visualization, the study reveals a multi-state model of DNA mobility linked to the formation of transcription factor condensates. The authors conclude that differential DNA kinetics serve as a reliable indicator for detecting condensate formation during gene activation, offering new insights into the mechanisms regulating gene expression within the nucleus.

      Strengths:

      The authors have done substantial work, and a major strength of the manuscript is being able to image both DNA and RNA from the same gene, as well as the TF that acts on that gene. This multi-pronged approach leads to complementary insights into transcription bursting mechanisms.

      Weaknesses:

      A major weakness of the manuscript is the lack of appropriate controls that support the specificity of the effects observed. The exclusive focus on condensates as the underlying mechanism to explain their data is also a bit limiting.

    1. eLife Assessment

      This important study resolves the structure of one missing piece of the eukaryotic DNA replication fork, the leading strand clamp loader. Overall, the data are convincing, with electron microscopy data providing a strong basis for analyzing differences and similarities with other RFC complexes. A minor point is that the evidence supporting the proposed role of the β-hairpin is incomplete.