10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      Pannexin (Panx) channels are a family of poorly understood large-pore channels that mediate the release of substrates like ATP from cells, yet the physiological stimuli that activate these channels remain poorly understood. The study by Henze et al. describes an elegant approach wherein activity-guided fractionation of mouse liver led to the discovery that lysophospholipids (LPCs) activate Panx1 and Panx2 channels expressed in cells or reconstituted into liposomes. The authors provide compelling evidence that LPC-mediated activation of Panx1 is involved in joint pain and that Panx1 channels are required for the established effects of LPC on inflammasome activation in monocytes, suggesting that Panx channels play a role in inflammatory pathways. Overall, this important study reports a previously unanticipated mechanism wherein LPCs directly activate Panx channels. The work will be of interest to scientists investigating phospholipids, Panx channels, purinergic signalling and inflammation.

      [Editors' note: this paper was reviewed and curated by Biophysics Colab]

    2. Joint Public Review:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, and molecular modelling. Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be an important study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. This study provides an excellent foundation for future studies and importantly provides clinical relevance.

      [Editors' note: this paper has been through two rounds of review and revisions, available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601]

    3. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. eLife Assessment

      This descriptive study used multiparameter spectral flow cytometry and clustering analysis of a subset of CD4 T cells, termed circulating T follicular helper (cTfh), responding to Plasmodium falciparum antigens, PfSEA -1A and PfGARP. The results from this comprehensive study provide valuable information regarding differences in cTfh response profiles between children and adults living in malaria-endemic Kenya and thus offer a potential usefulness towards improving choices of antigen candidates for malaria vaccines. However, the analysis and interpretation of antigen-specific CD4 cTfh responses remain incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to understand the malaria antigen-specific cTfh profile of children and adults living in malaria holoendemic area. PBMC samples from children and adults were unstimulated or stimulated with PfSEA-1A or PfGARP in vitro for 6h and analysed by a cTfh-focused panel. Unsupervised clustering and analysis on cTfh was performed. The main conclusions are: A) the children cohort has a more diverse (cTfh1/2/17) recall responses compared to adults (mainly cTfh17) and, B) Pf-GARP stimulates better cTfh17 responses in adults, thus a promising vaccine candidate.

      Strengths:

      This study is, in general, well-designed and with excellent data analysis. The use of unsupervised clustering is a nice attempt to understand the heterogeneity of cTfh cells.

      Weaknesses:

      The authors have provided additional data in Supplementary Figures 14-16. However, I remain concerned about whether cTfh cells are truly responding to antigen stimulation. In Supplementary Figure 15A-F, the IFNg responses appear as expected, SEB elicits the strongest response, as it stimulates bulk T cells, and the staining is promising, showing a clear distinction between IFNg+ and IFNg- populations. However, in Supplementary Figure 15I-N, the IL-21 secretion assay is concerning. The FACS plots make it difficult to distinguish IL-21+ from IL-21- cells, raising concerns about the validity of this analysis. Additionally, in panel J, the responses to PfSEA-1A or PfGARP appear even greater than those to SEB stimulation. In PBMCs, only a small percentage of T cells should be specific to a particular antigen. How can the positive control (SEB) produce a weaker response than stimulation with a specific antigen? This suggests that the IL-21 secretion assay may not have worked, making the authors' interpretation unreliable.

      I also have similar concerns about the IL-4 secretion in Sup Figure 16. First, the FACS plot shows that appear double-positive for IL-21 and IL-4, so it suggests the staining may be due to autofluorescence rather than true cytokine signals. Also in B-C the responses of SEB stimulation is generally weaker than stimulated by one antigen, further questioning the reliability of the IL-4 assay. In summary, I am not convinced that the in vitro antigen stimulation assay worked as intended. Consequently, the manuscript's claims regarding PfSEA-1A- and PfGARP-specific cTfh responses are not sufficiently supported by the presented data.

    3. Reviewer #3 (Public review):

      Summary:

      The goal of this study was to carry out an in-depth granular and unbiased phenotyping of peripheral blood circulating Tfh specific to two malaria vaccine candidates, PfSEA-1A and PfGARP, and correlate these with age (children vs adults) and protection from malaria (antibody titers against Plasmodium antigens.) Authors further attempted to identify any specific differences of the Tfh responses to these two distinct malaria antigens.

      Strengths:

      The authors had access to peripheral blood samples from children and adults living in a malaria-endemic region of Kenya. The authors studied these samples using in vitro restimulation in the presence of specific malaria antigens. Authors generated a very rich data set from these valuable samples using cutting-edge spectral flow cytometry and a 21-plex panel that included a variety of surface markers, cytokines and transcription factors.

      Update following first revision (R1) of the manuscript:

      The authors have made a great effort to comprehensively address comments raised by the reviewers. In particular, clearly showing expression of ICOS and Bcl6 on CXCR5+ cells greatly strengthens the case for defining these cells as Tfh-like circulatory lymphocytes (cTfh).

      Weaknesses:

      Update following first revision (R1) of the manuscript:

      Unfortunately, my main concern remains. As it stands, the study is not really on antigen-specific T cells, but rather on the overall CD4 T cell compartment plus or minus antigenic stimulation. Although authors used an in vitro restimulation strategy with malaria antigens, they do not focus on cells de-novo expressing activation markers as a result of restimulation, neither they use tetramers to detect antigen-specific T cells. Moreover, their data shows that the number of CXCR5+ CD4 T cells de-novo expressing activation markers and/or cytokines as a result of their in vitro restimulation is negligible, even when using a prototypic superantigen (SEB).

      Thus, no antigen-specific CXCR5+ CD4 T cells could be analysed with the data that the authors provide in this manuscript.

    4. Reviewer #4 (Public review):

      Summary:

      This manuscript is a descriptive study of circulating T follicular helper (cTfh) responses to PfSEA -1A or PfGARP (targets of new antimalaria vaccine candidates) in PBMCs from a convenience sample of children (7 yrs of age) and adults living in a malaria holo endemic Kenya using multiparameter flow cytometry and clustering analysis. This cell type promotes B cell production of long-lived antimalarial antibodies to provide protection against malaria. They find that children had a wider cTFH cytokine and TF profile cellular response in comparison to adults who responded to both antigens but had a narrower response profile.

      Strengths:

      Carefully done study, very detailed, nice summary model at the end of the paper. The revision provides requested clarification on a number of issues, including CD40L expression which was not differentially expressed between groups. They add additional data into the supplemental files, including IL4 and IL21 data by presenting the cytoplots.

      Weaknesses:

      To know the significance of these cTfh cells for long-term protection of malaria requires functional and transfer experiments in animal models which is outside the scope of this work.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aims to understand the malaria antigen-specific cTfh profile of children and adults living in a malaria holoendemic area. PBMC samples from children and adults were unstimulated or stimulated with PfSEA-1A or PfGARP in vitro for 6h and analysed by a cTfh-focused panel. Unsupervised clustering and analysis on cTfh were performed.

      The main conclusions are:

      (1) the cohort of children has more diverse (cTfh1/2/17) recall responses compared to the cohort of adults (mainly cTfh17) and

      (2) Pf-GARP stimulates better cTfh17 responses in adults, thus a promising vaccine candidate.

      Strengths:

      This study is in general well-designed and with excellent data analysis. The use of unsupervised clustering is a nice attempt to understand the heterogeneity of cTfh cells. Figure 9 is a beautiful summary of the findings.

      Weaknesses:

      (1) Most of my concerns are related to using PfSEA-1A and PfGARP to analyse cTfh in vitro stimulation response. In vitro, stimulation on cTfh cells has been frequently used (e.g. Dan et al, PMID: 27342848), usually by antigen stimulation for 9h and analysed CD69/CD40L expression, or 18h and CD25/OX40. However, the authors use a different strategy that has not been validated to analyse in vitro stimulated cTfh. Also, they excluded CD25+ cells which might be activated cTfh. I am concerned about whether the conclusions based on these results are reliable.

      It has been shown that cTfh cells can hardly produce cytokines by Dan et al. However, in this paper, the authors report the significant secretion of IL-4 and IFNg on some cTfh clusters after 6h stimulation. If the stimulation is antigen-specific through TCR, why cTfh1 cells upregulate IL-4 but not IFNg in Figure 6? I believe including the representative FACS plots of IL-4, IFNg, IL21 staining, and using %positive rather than MFI can make the conclusion more convincing. Similarly, the author should validate whether TCR stimulation under their system for 6h can induce robust BCL6/cMAF expression in cTfh cells. Moreover, there is no CD40L expression. Does this mean TCR stimulation mediated BCl6/cMAF upregulation and cytokine secretion precede CD40L expression?

      In summary, I am particularly concerned about the method used to analyse PfSEA-1A and PfGARP-specific cTfh responses because it lacks proper validation. I am unsure if the conclusions related to PfSEA-1A/PfGARP-specific responses are reliable.

      An unfortunate reality of these types of complex immunologic studies is that it takes time to optimize a multiparameter flow cytometry panel, run this number of samples, and then conduct the analysis (not to mention the time it takes for a manuscript to be accepted for peer-review). An unexpected delay, frankly, was the COVID-19 pandemic when non-essential research lab activities were put on hold. We designed our panel in 2019 and referred to the “T Follicular Helper Cells” Methods and Protocols book from Springer 2015. Obviously the field of human immunology took a huge leap forward during the pandemic as we sought to characterize components of protective immunity, and as a result there are several new markers we will choose for future studies of Tfh subsets. We agree with the reviewer that cytokine expression kinetics differ depending on the in vitro stimulation conditions. Due to small blood volumes obtained from healthy children, we were limited in the number of timepoints we could test. However, since we were most interested in IL21 expression, we found 6 hrs to be the best in combination with the other markers of interest during our optimization experiments. We did find IFNg expression from non-Tfh cells, therefore we believe our stimulation conditions worked.

      Dan et al used stimulated tonsils cells to assess the CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh and CXCR5<sup>neg</sup> CD45RA<sup>neg</sup> non-Tfh whereas in our study, we evaluated CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh from PBMCs. Dan et al PBMCs’ work used EBV/CMV or other pathogen product stimuli and only gated on CD25<sup>pos</sup>OX40<sup>pos</sup> cells which are not the cells we are assessing in our study. This might explain in part the differences in cytokine kinetics, as we evaluated CD25<sup>neg</sup> PBMCs only. However, we agree that more recent studies focused on CXCR5<sup>pos</sup>PD1<sup>pos</sup> cells included more Activation-induced marker (AIM) markers, which are missing in our study, inducing a lack of depth in our analysis.

      Percentage of positive cells and MFI are complementary data. Indeed, the percentage of positive cells only indicates which cells express the marker of interest without giving a quantitative value of this expression. MFI indicates how much the marker of interest is expressed by cells which is important as it can indicate degree of activation or exhaustion per cell. Meta-cluster analysis is not ideal to assess the percentage of positivity whereas it does provide essential information regarding the intensity of expression. We added supplemental figures 14 (Bcl6 and cMAF), 15 (INFg and IL21) and 16 (IL4 and IL21) where percentage of positive cells were manually gated directly from the total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Results from the manual gating are consistent with the results we show using clustering. However, it helps to better visualize that antigen-specific IL21 expression was statistically significant in children whereas the high background observed for adults did not reveal higher expression after stimulation, perhaps suggesting an upper threshold of cytokine expression (supplemental figure 15). The following sentence has been added in the methods at the end of the “OMIQ analysis” section: “ However, the percentage of positive IFN𝛾, IL-4, IL-21, Bcl6, or cMAF using manual gating can be found in Supplemental Figures 14, 15, and 16 along with the overlay of the gated positive cells on the CD4<sup>pos</sup>CXCR5<sup>pos</sup>CD25<sup>neg</sup> UMAP and the cytoplots of the gated positive cells for each meta-cluster (Supplemental Figures 14, 15, and 16).”

      Indeed cMAF can be induced by TCR signaling, ICOS and IL6 (Imbratta et. al, 2020). However, in our study populations, ICOS was expressed (see Author response image 1, panel A) in absence of any stimulation suggesting that CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells were already capable of expressing cMAF. Indeed, after gating Bcl6 and cMAF positive cells based on their FMOs (Author response image 1, panel B and C, respectively), we overlaid positive cells on the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells UMAP and we can see that most of our cells already express cMAF alone (Author response image 1, panel D), co-express cMAF and Bcl6 (Author response image 1, panel E), confirming that they are TfH cells, whereas very few cells only expressed Bcl6 alone (Author response image 1, panel F). Because we knew that cT<sub>FH</sub> already expresses Bcl6 and cMAF, we focused our analysis on the intensity of their expression to assess if our vaccine candidates were inducing more expression of these transcription factors.

      Author response image 1.

      (2) The section between lines 246-269 is confusing. Line 249, comparing the abundance after antigen stimulation is improper because 6h stimulation (under Golgi stop) should not induce cell division. I think the major conclusions are contained in Figure 5e, that (A) antigen stimulation will not alter cell number in each cluster and (B) children have more MC03, 06 and fewer MC02, etc.). The authors should consider removing statements between lines 255-259 because the trends are the same regardless of stimulations.

      We agree, there is no cell division after 6h and that different meta clusters did not proliferate after this short of in vitro stimulation. The use of the word ‘abundance’ in the context of cluster analysis is in reference to comparing the contribution of events by each group to the concatenated data. After the meta clusters are defined and then deconvoluted by study group, certain meta clusters could be more abundant in one group compared to another - meaning they contributed more events to a particular metacluster.

      Dimensionality reduction is more nuanced than manual gating and reveals a continuum of marker expression between the cell subsets, as there is no hard “straight line” threshold, as observed when using in 2D gating. Because of this, differences are revealed in marker expression levels after stimulation making them shift from one cluster to another - thereby changing their abundance.

      To clarify how this type of analysis is interpreted, we have modified lines 255-259 as follows:

      “In contrast, the quiescent PfSEA-1A- and PfGARP-specific cT<sub>FH</sub>2-like cluster (MC02) was significantly more abundant in adults compared to children (Figure 5c and 5d, pf<0.05). Interestingly, following PfGARP stimulation, the activated cT<sub>FH</sub>1/17-like subset (MC09) became more abundant in children compared to adults (Figure 5d, pf<0.05 with a False Discovery Rate=0.08), but no additional subsets shifted phenotype after PfSEA-1A stimulation (Figure 5c).”

      Reviewer #2 (Public Review):

      Summary:

      Forconi et al explore the heterogeneity of circulating Tfh cell responses in children and adults from malaria-endemic Kenya, and further compare such differences following stimulation with two malaria antigens. In particular, the authors also raised an important consideration for the study of Tfh cells in general, which is the hidden diversity that may exist within the current 'standard' gating strategies for these cells. The utility of multiparametric flow cytometry as well as unbiased clustering analysis provides a potentially potent methodology for exploring this hidden depth. However, the current state of analysis presented does not aid the understanding of this heterogeneity. This main goal of the study could hopefully be achieved by putting all the parameters used in one context, before dissecting such differences into their specific clinical contexts.

      Strengths:

      Understanding the full heterogeneity of Tfh cells in the context of infection is an important topic of interest to the community. The study included clinical groupings such as age group differences and differences in response to different malaria antigens to further highlight context-dependent heterogeneity, which offers new knowledge to the field. However, improvements in data analyses and presentation strategies should be made in order to fully utilize the potential of this study.

      Weaknesses:

      In general, most studies using multiparameter analysis coupled with an unbiased grouping/clustering approach aim to describe differences between all the parameters used for defining groupings, prior to exploring differences between these groupings in specific contexts. However, the authors have opted to separate these into sections using "subset chemokine markers", "surface activation markers" and then "cytokine responses", yet nuances within all three of these major groups were taken into account when defining the various Tfh identities. Thus, it would make sense to show how all of these parameters are associated with one another within one specific context to first logically establish to the readers how can we better define Tfh heterogeneity. When presented this way, some of the identities such as those that are less clear such as "MC03/MC04/ MC05/ MC08" may even be better revealed. once established, all of these clusters can then be subsequently explored in further detail to understand cluster-specific differences in children vs adults, and in the various stimulation conditions. Since the authors also showed that many of the activation markers were not significantly altered post-stimulation thus there is no real obstacle for merging the entire dataset for the first part of this study which is to define Tfh heterogeneity in an unbiased manner regardless of age groups or stimulation conditions. Other studies using similar approaches such as Mathew et al 2020 (doi: 10.1126/science.abc8) or Orecchioni et al 2017 (doi: 10.1038/s41467-017-01015-3) can be referred to for more effective data presentation strategies.

      Accordingly, the expression of cytokines and transcription factors can only be reliably detected following stimulation. However, the underlying background responses need to be taken into account for understanding "true" positive signals. The only raw data for this was shown in the form of a heatmap where no proper ordering was given to ensure that readers can easily interpret the expression of these markers following stimulation relative to no stimulation. Thus, it is difficult to reliably interpret any real differences reported without this. Finally, the authors report differences in either cluster abundance or cluster-specific cytokine/ transcription factor expression in Tfh cell subsets when comparing children vs adults, and between the two malaria antigens. The comparisons of cytokine/transcription factor between groups will be more clearly highlighted by appropriately combining groupings rather than keeping them separate as in Figures 6 and 7.

      Thank you for sharing these references. Similar to SPADE clustering and ViSNE dimensionality algorithms used in Orecchioni et al, we used all the extracellular markers from our panel in our FlowSOM algorithm with consensus meta-clustering which includes both the chemokine receptors and activation markers even though they are presented separately in our manuscript across the figure 3 and 4. This was explained in the methods section (lines 573 - 587). We then chose the UMAP algorithm as visual dimensionality reduction of the meta-clusters generated by FlowSOM-consensus meta-clustering as explained under the “OMIQ analysis” subpart of our methods (lines 588- 604). Therefore, we believe we have conducted the analysis as this reviewer suggests even if we chose to show the figures that were informative to our story. The heatmap of the results brings the possibility to see which combination of markers respond or not to the different conditions and between groups, all the raw data are present from the supplemental figures 10 to 13 showing, using bar plots, the differences expressed in the heatmaps. We believe it strengthens our interpretation of the results.

      Regarding the transcription factor and cytokine background, we added supplemental figures 14, 15 and 16 where we used manual gating to select Bcl6, cMAF, IFNg, IL21 or IL4 positive cells directly from total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH cells based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Moreover, all the dot plots (with their statistics) used for the heatmap figure 6 and 7 can be found in the supplemental figures 10, 11, 12 and 13. These supplemental figures address the concerns above by showing the difference of signals between unstimulated and stimulated conditions.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study was to carry out an in-depth granular and unbiased phenotyping of peripheral blood circulating Tfh specific to two malaria vaccine candidates, PfSEA-1A and PfGARP, and correlate these with age (children vs adults) and protection from malaria (antibody titers against Plasmodium antigens.). The authors further attempted to identify any specific differences in the Tfh responses to these two distinct malaria antigens.

      Strengths:

      The authors had access to peripheral blood samples from children and adults living in a malaria-endemic region of Kenya. The authors studied these samples using in vitro restimulation in the presence of specific malaria antigens. The authors generated a very rich data set from these valuable samples using cutting-edge spectral flow cytometry and a 21-plex panel that included a variety of surface markers, cytokines, and transcription factors.

      Weaknesses:

      - Quantifying antigen-specific T cells by flow cytometry requires the use of either 1- tetramers or 2- in vitro restimulation with specific antigens followed by identification of TCR-activated cells based on de-novo expression of activation markers (e.g. intracellular cytokine staining and/or surface marker staining). Although authors use an in vitro restimulation strategy, they do not focus their study on cells de-novo expressing activation markers as a result of restimulation; therefore, their study is not really on antigen-specific cTfh. Moreover, the authors report no changes in the expression of activation markers commonly used to identify antigen-specific T cells upon in vitro restimulation (including IFNg and CD40L); therefore, it is not clear if their in vitro restimulation with malaria antigens actually worked.

      We understand the reviewer’s point of view and apologies for any confusion. IFNg was expressed but not statistically different between groups. Indeed, looking at the CD8 T cells and using manual gating, we were able to show that IFNg was increased but not statistically significant upon stimulation from CD4<sup>pos</sup>CXCR5<sup>pos</sup> cells (supplemental figure 15, panel C), confirming our primary observation using clustering analysis. These results showed that our malaria antigen induced IFNg response in some participants, but not all of them, revealing heterogeneity in this response among individuals within the same group.

      Regarding CD40L, in the supplemental figure 7, we can see that some of our meta-clusters expressed more CD40L upon stimulation, but again without leading to statistical differences between groups. Combined with the increased expression of other cytokines and transcription factors, we showed that our stimulation did indeed work. However, because of the high variation within groups, there were no statistical differences across our groups. Because CD40L is not the only marker showing specific T cell activation, and not all T cells respond using this marker alone, a more comprehensive multimarker AIM panel might have highlighted differences between groups. We recognized the limitations of our study and believe that future study will benefit from more activation markers commonly used to identify antigone-specific T cells such as CD69, OX40, 4-1BB (AIM panel), among other markers.

      - CXCR5+CD4+ memory T cells have been shown to present multi-potency and plasticity, capable of differentiating to non-Tfh subsets upon re-challenge. Although authors included in their flow panel a good number of markers commonly used in combination to identify Tfh (CXCR5, PD-1, ICOS, Bcl-6, IL-21), they only used one single marker (CXCR5) as their basis to define Tfh, thus providing a weak definition for Tfh cells and follow up downstream analysis.

      Sorry for the confusion, even though the subsampled on the CD4<sup>pos</sup>CXCR5<sup>pos</sup> CD25<sup>neg</sup> cells to run our FlowSOM, we showed the different levels of expression across meta-clusters (figure 4 panels A and B) of PD1 (Tfh being PD1 positive cells) and ICOS (indicating the activation stage of the Tfh, “T Follicular Helper Cells” Methods and Protocols book from Springer 2015). We also included an overlay of the manually gated double positive Bcl6-cMAF cells on the CXCR5<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> CD4 T cell UMAP plot to show that most of them express Bcl6 (supplemental figure 14). Interestingly, the manually gated IL21 positive cells were less abundant, particularly for children (supplemental figure 15). Because we were not able to include all the markers that are now used to define Tfh cells, we referred to our cell subsets as “TFH-like”. This is an acknowledged limitation of our study. Due to the limited blood volume obtained from children and cost of running multiplex flow cytometry assays, our results showing antigen-specific heterogeneity of Tfh subset will have to be validated in future studies that include these additional defining markers.

      - Previous works have used FACS-sorting and in vitro assays for cytokine production and B cell help to study the functional capacity of different cTfh subsets in blood from Plasmodium-infected individuals. In this study, authors do not carry out any such assays to isolate and evaluate the functional capacity of the different Tfh subsets identified. Thus, all the suggestions for the role that these different cTfh subsets may have in vivo in the context of malaria remain highly hypothetical.

      Unfortunately, low blood volumes obtained from children prevented us from running in vitro functional assays and the study design did not allow us to correlate them with protection. However, since the function of identified Tfh subsets from malaria-exposed individuals has been evaluated using Pf lysates in other studies, we referenced them when interpreting the differences we reported in Tfh subset recognition between malaria antigens. If either of these antigens move forward into vaccine trials, then evaluating their function would be important.

      - The authors have not included malaria unexposed control groups in their study, and experimental groups are relatively small (n=13).

      This study design did not include the recruitment of malaria naive negative controls as its goal was to assess malaria antigen-specific responses comparing the quality and abundance between malaria-exposed children to adults to these potential new vaccine targets PfSEA-1A and PfGARP. We did however test 3 malaria-naive adults and found no non-specific activation after stimulation with these two malaria antigens. Since this was done as part of our assay optimization, we did not feel the need to show these negative findings.

      And even with our small sample size, we demonstrated significant age-associated differences in malaria antigen-specific responses from cT<sub>FH</sub>-like subsets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points are:

      (1) Line 88, cTfh cells are not only from GC-Tfh, they have GC-independent origin (He et al, PMID: 24138884).

      The following sentence was added line 88 “Interestingly, cT<sub>FH</sub> cells can also come from peripheral cT<sub>FH</sub> precursor CCR7<sup>low</sup>PD1<sup>high</sup>CXCR5<sup>pos</sup> cells; thus, they also have a GC-independent origin (He, Cell, 2013 PMID: 24138884).

      (2) I believe all participants were free of blood-stage infection upon enrolment. But can authors clearly state this information between lines 151-159?

      We mentioned in the methods, line 495-496 “Participants were eligible if they were healthy and not experiencing any symptoms of malaria at the time venous blood was collected”. However, using qPCR we found 5 children with malaria blood stage. As shown in Author response image 2, comparing malaria free to blood-stage children, no differences were observed without any stimulation. However, MC03 is more abundant upon malaria antigen stimulation in the blood-stage group whereas MC04 is more abundant in the malaria free group upon PfGARP stimulation only confirming that our stimulation worked.

      Author response image 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The strategy for gating on antigen-specific cTfh cells needs to be revised. The correct approach would be to gate on those cells that respond by de-novo expression of activation markers upon antigen restimulation (also termed activation-induced markers. e.g. CD69, CD40L, CXCL13 and IL-21, Niessl 2020; CD69, CD40L, CD137 and OX40, Lemieux 2023; CD137 and OX40, Grifoni 2020). As it stands, the study is not really on antigen-specific T cells, but rather on the overall CD4 T cell compartment plus or minus antigenic stimulation.

      We recognized the limitation in our flow panel design which prevents us from performing this gating. We originally based our panel design on the “T follicular helper cells methods and protocols” book (Springer 2015) which used CD45RA, CD25, CXCR5, CCR6, CXCR3, CCR7, ICOS and PD1 to define cT<sub>FH</sub>. We had already optimized our 21-color panel, purchased reagents and started to run our experiments by the time these publications modified how to define TFH cells Niessl, Lemieux and Grifoni’s publication. Indeed we optimized and performed our assay from November 2019 to March 2020, finishing to run the samples during the first quarantine. Because of the urgent needs of research on SARS-CoV-2 that we were involved with from this time and moving forward, the analysis of our TFH work got highly postponed. Moreover, 2020 is also the year where many TFH papers came out with better ways to define cT<sub>FH</sub> and responses to antigen stimulations. In our future studies, our panel will include AIM.

      (2) It is not clear if the antigenic stimulation actually worked. Does the proportion of IFNg+ or IL-4+ or IL-21+ or CD40L+ or CD25+ CD4 or CD8 T cells increase following in vitro antigen restimulation?

      Yes, using manual gating, we are able to show an increase of IL4 (supplemental figure 16 panel B and C), and IL21 (supplemental figure 15 panel J and K) production in both children and adults. However, we did not observe significant production of IFNg (supplemental figure 15, panel C) and changes in CD40L expression (supplemental figure 7) after malaria antigen stimulation, however, our positive control SEB worked. So, yes our stimulation assay worked but these 2 malaria antigens did not significantly induce these cytokines. This could be that they are too low to detect in every participant since they are single antigens and not whole parasite lysates, as other studies have used. It could also be that these antigens don’t stimulate CD40L or IFNg in all our participants. We brought up this limitation as follow in the discussion, line 473: “Although the heterogeneity in the response of CD40L and IFNγ suggests that our tested malaria antigens did not induce significant differences in the expression of these markers in all our participants, our panel did not include other activated induced markers, such as OX40, 4-1BB, and CD69”.

      (3) It is not clear what is the proportion of cTfh over the total CD4 T cell compartment among the different groups. Does this vary among different groups? It would be valuable to display this as an old-fashioned combination of contour plots with outliers for illustrating flow cytometry and bar graphs for the cumulative data.

      The proportion of CD3<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CXCR5<sup>pos</sup> cTfh cells did not differ within the total number of CD4 T cells between groups (figure 2).

      (4) The gating strategy could be refined and become more robust if adding additional markers in combination with CXCR5 for identifying cTfh (e.g. CXCR5+Bcl6+).

      Thank you for this suggestion. An overlay of Bcl6 expression can be found in supplemental figure 14 where we confirm that our CXCR5+ cT<sub>FH</sub>-like subsets express cMAF and Bcl6.

      (5) The protocols for intracellular and intranuclear staining seem to be incomplete in Materials and Methods. In particular, cell permeabilization strategies seem to be missing.

      Our apologies for this oversight, we added the following sentences in the methods line 545: “Cells were fixed and permeabilized for 45 mins using the transcription factor buffer set (BD Pharmingen) followed by a wash with the perm-wash buffer. Intracellular staining was performed at 4 °C for 45 more mins followed by two washes using the kit’s perm-wash buffer”.

      (6) In Materials and Methods, the authors mention they have used fluorescence minus one control to set their gating strategy. It would be valuable to show these, either on the main body or as part of supplementary figures.

      We added the cytoplots of the FMOs and/or negative controls as appropriate in the supplemental figures 14 (cMAF and Bcl6), 15 (IFNg and IL21) and 16 (IL4 and IL21).

      (7) Line 194 and Figure 3, it is not clear the criteria that the authors used for down-sampling events before FlowSOM analysis. Was this random? Was this done with unstimulated or stimulated samples?

      We chose to down-sample on CD3posCD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> and CXCR5<sup>pos</sup> cells prior to our FlowSOM to allow more cluster analysis to focus only on the differences among those cells. The down-sampling used 1,000 CD3posCD4<sup>pos</sup>CD25<sup>neg</sup> CD45RA<sup>neg</sup>CXCR5<sup>pos</sup> cells from each fcs file (unstimulated and stimulated samples). If the fcs file had more than 1,000 CXCR5<sup>pos</sup> cells, the down-sampling was done randomly by the OMIQ platform algorithm to select only 1,000 CXCR5<sup>pos</sup> cells within this specific fcs file. The latest sentence was added to the methods line 593.

      (8) Lanes 201, 202, As it stands, the take of the authors on the role of different cTfh subsets during infection remains highly speculative. Are these differences in cTfh phenotypes actually reflected in their in vitro capacity to provide B cell help (e.g. as in the Obeng-Adjei 2015 paper) or to produce IL-21, express co-stimulatory molecules, or any other characteristic that would allow them to better infer their functional roles during infection? Any additional in vitro analysis of the functional capacity of isolated cTfh subsets identified in this research would greatly increase its value.

      We agree with the reviewer that this sentence is speculative, and we rephrase it as follow: “First, we found different CXCR5 expression levels between meta-clusters (Figure 3b); CXCR5 is essential for cT<sub>FH</sub> cells to migrate to the lymph nodes and interact with B-cells”. We would have liked to perform in vitro functional assays. However, as explained above, we did not have sufficient cells collected from children to do so.

      (9) It is not clear why authors omitted IL-17 and did not use IFNg and IL-4 to refine their definition of Th1, Th2 and Th17 cTfh.

      We would have liked to include IL-17, however we were constrained by only having access to a 4 lasers cytometer at the time we ran our assay. In light of needing to prioritize markers, when we were designing our flow panel, cTfh1 were shown to be preferentially activated during episodes of acute febrile malaria children (Obeng-Adjei). Therefore, we chose to focus on IFNg and IL4 to differentiate Tfh1 from Tfh2, in addition to other markers as surrogate of functional potential. We did not use IFNg and IL4 to refine our definition of Tfh1, Tfh2 and Tfh17 as recent publications have shown that IL4 is not only expressed in Tfh2 but also in the other Tfh subsets, at lower intensity (Gowthaman among others). Therefore IFNg and IL4 by themselves were not sufficient to properly define the different Tfh subsets. In future studies, we plan to include transcription factor profiles (T-bet, BATF, GATA3) to further refine definitions of Tfh subsets.

      (10) Lines, 226, 228, based on the combination of markers that the MC03 subset expresses, it is tempting to think that this is the only "truly" committed Tfh subset from the entire analysis. Please, discuss.

      If the reviewer is referring to changes in marker expression levels that indicate they have not reached a level of differentiation that would make them reliable (ie “true) Tfh cells, we agree that this is an important question now that we have technology that can measure and analyse so many phenotypic markers at once. This brings forward the need for the scientific method - to replicate study findings to determine whether they are consistent given the same study design and experimental conditions.

      (11) Lines 243 244, Again, is this reflected in functional capacity?

      The study described in this manuscript did not include functional assays. However, this did not change the key finding that different malaria antigens behaved differently, demonstrating heterogeneity in Tfh recognition of malaria antigens. Regarding CD40L expression, we did not observe differences between groups, however some individuals had an increase of their CD40L (supplemental figure 7). It is possible that some individuals had responded through other activated induced markers (CD69, ICOS, OX40, 4-1BB among others) and that our stimulation condition was not long enough to assess CD40L expression upon malaria antigen stimulation. This limitation has been addressed by editing the line 243-244 as follows: “we were unable to find statistical differences in the CD40L expression between groups as only few individuals responded through it (supplemental figure 7).”

      (12) Lines 243, 244, Are these cTfh subsets exclusively detected in malaria-exposed individuals? This is confounded by the lack of a malaria unexposed control group in this study, which would have been highly valuable.

      We agree with the reviewer that having non-naive children would have been valuable as a negative control group. However, this study was conducted in Kenya where all children are suspected to have had at least one malaria infection. We also did not have ethical approval or the means to enroll children in the USA who would not have been exposed to malaria as a negative control group. Since we were also evaluating differences by age group, comparing US adults would not have helped to address this point. Therefore, this remains an open question that might be addressed by another study recruiting children in non-malaria endemic areas.

      (13) Line 267, as the authors have not gated on T cells de-novo expressing activation markers in response to antigen restimulation, how do they know these are indeed antigen-specific cTfh?

      Omiq analysis accounts for marker expression levels in the resting cells (unstimulated well) for each individual compared to each experimental/stimulated well. The algorithm computationally determines whether that expression level changed without an arbitrary positive threshold, keeping the expression levels as a continuous variable, not dichotomous - which is the power of unbiased cluster analyses. Therefore, we know that these cells are antigen-specific based on the statistical difference in intensity expression between the resting cells and the stimulated ones. Nevertheless, manual gating to show “de-novo” responding cells, produced the same results as assessing the MFI of each meta-cluster (supplemental figures 14, 15 and 16).

      (14) Lines, 292-295, it is very surprising that Tfh cells would not produce IL-21 upon restimulation. Have the authors observed upregulation of IL-21 following SEB restimulation?

      Yes, we observed IL21 positive cells upon SEB stimulation (supplemental figure 15, panel J and K). However we found unexpectedly high background levels of IL21, specifically within the adult group (supplemental figure 15, panel K and M) making it challenging to find antigen-specific increases above background. Interestingly, an increase in IL21 using manual gating was observed upon PfSEA-1A or PfGARP stimulation in children (supplemental figure 15, panel J and L).

      (15) In Figures 3 and 4, it is not clear if there are any significant differences in expression of different markers between different cTfh subsets and/or different conditions. Moreover, the lack of differences in response to antigen stimulation seems to suggest that it did not work adequately.

      We intentionally chose 6-hours stimulation to better assess changes in cytokines which we did. However, because it is a short stimulation, we did not expect dramatic changes in the extracellular markers presented in the figure 3 and 4. A longer stimulation, such as 24h, will highlight properly these changes.

      (16) Figure 5b would benefit from bar graphs.

      Please find below the bar-graphs for the highlighted meta-clusters in figure 5b. We did not include these bar-graphs to our figure 5 as they do not bring new information. They repeat the information already presented through the EdgeR plot.

      Author response image 3.

      (17) Figures 6 and 7 would greatly benefit from showing individual examples of old-fashioned contour with outliers flow plots to illustrate the different cTfh subsets identified in the study.

      The different cT<sub>FH</sub> subsets can be found with a contour plot with outliers in the supplemental figure 4.

      (18) Figures 3,4, 6, and 7, the authors exclusively focused on the study of MFI to measure the expression of cytokine and transcription factors among different groups/stimulations. Have the authors observed any differences in the percentage or absolute counts of cytokine+ and/or TF+ between different subsets of cTfh and/or different conditions?

      Yes. We added the supplemental figures 14 (transcription factors) and 15/16 (cytokines) where cytokines and transcription factors were assessed using manual gating. We found that total CD4<sup>pos</sup>CXCR5<sup>pos</sup> IL4 was significantly increased upon stimulation in both adults and children while IFNg was not. However, we found significantly higher IFNg on total CD8<sup>pos</sup> cells showing that the stimulation worked, but the total CD4<sup>pos</sup>CXCR5<sup>pos</sup> did not express IFNg. Finally, we observed a trend of higher IL21<sup>pos</sup>CD4<sup>pos</sup>CXCR5<sup>pos</sup> in adults, not significant due to high background whereas IL21 was significantly increased upon stimulation in children. Regarding cMAF and Bcl6, both transcription factors were significantly increased upon stimulation within children only.

      (19) Figure 8, the definition for high and low PfGARP antibody titers seems rather arbitrary. Are these associations still significant when attempting a regular correlation analysis between Ab values (i.e. Net MFI) and different cTfh subsets?

      Yes, the definition for high and low PfGARP antibody levels is arbitrary but when looking at the antibody data (figure 1b), it was naturally bimodal. Therefore as a sub-analysis, we assess the association between PfGARP antibodies levels and cT<sub>FH</sub> subsets, see Author response image 4. We checked the correlation between the abundance of the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA after PfGARP and PfSEA stimulation. We also checked the correlation between the MFI expression of Bcl6 and cMAF after stimulation (PfGARP or PfSEA-1A minus the unstimulated) by the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA. However, we believe that because of our small sample size, our results are not robust enough and that we risk over-interpreting the data. Therefore, we choose not to include this analysis in the manuscript.

      Author response image 4.

      (20) The comprehensive 21-plex panel that authors used in this study could generate insights on additional immune cells beyond cTfh (e.g. additional CD4 T cell subsets, CD8 T cells, CD19 B cells). It is not clear why the authors limited their analysis to cTfh only.

      The primary goal of the study was to assess the cT<sub>FH</sub> response to malaria vaccine candidates. However, we were able to assess the IFNg expression for CD8 T cells upon stimulation using the manual gating as indicated in the supplemental figure 15. Without additional markers to more clearly define other CD4 T cell or B cell subsets, we do not believe this dataset would go deep enough into characterizing antigen-specific responses to malaria antigens that would yield new insight.

      (21) Minor point, the punctuation should be revised throughout the manuscript.

      Punctuation was revised throughout the manuscript by our departmental scientific writer Dr. Trombly, as per reviewer request.

    1. eLife Assessment

      Understanding bacterial growth mechanisms potentially uncover novel drug targets which are crucial for maintaining cellular viability, particularly for bacterial pathogens. In this important study, Kapoor et al, investigate the role of Wag31 in lipid and peptidoglycan biosynthesis in mycobacteria. A detailed analysis of Wag31 domain architecture revealed a role in membrane tethering. More specifically, the N-terminal and C-terminal domains appeared to have distinct functional roles. The data presented are solid and support the conclusion made. This study will be of broad interest to microbiologists and molecular biologists.

    2. Reviewer #1 (Public review):

      This is a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.<br /> (2) The pulldown assays results are interesting, but the links are tentative.<br /> (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      In response to the above reviews the authors have made the required changes in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence.<br /> (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weakness:

      (1) Interactome analysis with truncated versions of the proteins could not be performed in M. smegmatis due to protein instability.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The authors did not address some of the comments. The following concerns should be addressed.

      • As far as I can tell, authors did not address my prior comments on Line 270, which is Line 280 in the revised manuscript: the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. Please indicate the page and line numbers in the revised manuscript so that I can identify the specific changes the authors made.

      • Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors acknowledge that this is a valid point, and indicated that they "will describe this caveat in the revised manuscript". I have difficulty finding where this revision was made. Please indicate the page and line numbers.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have added a better clarification on this in the discussion of revised manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response on lipid homeostasis could be attributed to a change in the stoichiometry of these interactions of Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions that would ultimately modulate lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of FLAG-Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates but not in the control were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      As mentioned in line 139 of the previous version of the manuscript, we agree that the interactions can either be direct or through a third partner. The fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. As mentioned above, this caveat was stated in the previous version of the manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence. (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pulldown experiments lack a valid negative control.

      We thank the reviewer for the comment. We have included two non-interactors of Wag31 i.e. MmpL4 and MmpS5 which were not identified in our interactome database as negative controls in the experiment. As shown in Figure S3, we performed His pull-down experiments with both of them independently twice, each time with a positive control (known interactor of Wag31 (Msm2092)). Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG tagged-MmpL4 or -MmpS5 or Msm2092 (revised Fig. S3c). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody (revised Fig. S3d.). The data presented confirms that the interactions validated through the pull down assay were indeed specific.

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      We thank the reviewer for the comment. Wag31<sub>Msm</sub> is a 272 amino acids long protein. The Nterminal of Wag31, which houses the DivIVA-domain, comprises the first 60 amino acids. Previously, we attempted to express the N-terminal (60 aa long) and the C-terminal (212 aa long) truncated proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither expressed with the N/C-terminal FLAG tag or no tag in episomal or integrative vectors due to instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and Cterminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni<sup>2+</sup>-affinity pull-down experiments for mass spectrometry. N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVA-domain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called  Wag31<sub>∆C</sub>  flanked by 6X His tags at both the termini was expressed in E. coli and mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub or Wag31<sub>∆N</sub> (in the revised manuscript) were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7e-g). Thus, we used the same set of interactors to test our hypothesis. Briefly, His-  Wag31<sub>∆C</sub>  was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAGMmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His- Wag31<sub>∆C</sub>  couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its Cterminal. However, we can’t ignore the possibility of other interactors binding to the N-terminal of Wag31. Unfortunately, due to poor expression/instability of  Wag31<sub>∆C</sub>  in mycobacterial shuttle vectors, we are unable to perform a global interactome analysis of  Wag31<sub>∆C</sub>

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comment. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011). This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M−<sup>1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M−<sup>1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our proteinlipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to reflect the data. Results shown suggest that the tethering activity of the Nterminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localisation (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023) and two groups have shown slightly sub-polar localisation of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it is likely that it associates with the polar MurG. However, since a sub-polar localisation of MurG has also been reported, it is possible that they do not interact directly and another protein mediates their interaction. Based on the above, we will modify the model proposed in Fig. 8.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model that reflects the results we obtained.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E32433251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Schägger, H. (2006). Tricine-SDS-PAGE. Nat Protoc, 1(1), 16-22. https://doi.org/10.1038/nprot.2006.4

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect in levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have included a clarification for this in the discussion section.

      (2) The pulldown assays results are interesting, but the links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing HisWag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Recommendations for the authors):

      I recommend the following experiments to strengthen the data presented:

      (1) Include a non-interacting FLAG-tagged protein as a negative control in the pull-down experiment to strengthen this data.

      We thank the reviewer for the comment. As suggested, we have included non-interacting FLAGtagged proteins as negative controls in the pulldown experiment. We chose MmpL4 and MmpS5 which were not found in the Wag31 interactome data. We performed pull-down experiments with both of them and included an interactor of Wag31 i.e. Msm2092 as a positive control. Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG taggedMmpL4 or -MmpS5 or -Msm2092 (Fig. S3c revised). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody. The pull down experiments were performed independently twice, every time with Msm2092 as the positive control (Fig. S3d. revised).

      (2) Perform the pull-down experiments using only the Wag31 N-terminus to rule out any role that it may have in the protein-protein interactions.

      We thank the reviewer for the comment. To rule out the possibility of N-terminal of Wag31 in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVAdomain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called Wag31<sub>∆C</sub> flanked by 6X His tags at both the termini was expressed in E. coli and subsequently mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub> or Wag31<sub>∆N</sub>  were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7 previous) so we used the same set of interactors to test our hypothesis. Briefly, His-Wag31<sub>∆C</sub>was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAG-MmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His-Wag31<sub>∆C</sub> couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its C-terminal. However, we can’t ignore the possibility of other proteins binding to the Nterminal of Wag31. Unfortunately, due to poor expression/instability of Wag31<sub>∆C</sub> in mycobacterial shuttle vectors, we couldn’t perform a global interactome analysis of Wag31<sub>∆C</sub>.

      Minor comments:

      - Please check the legend of Fig. 1g, it appears to be labelled incorrectly.

      We have checked it. It is correct. From Fig. 1g we are trying to reflect on the percentages of cells of the three strains i.e. Msm+ATc, Δwag31-ATc, and Δwag31+ATc displaying rod, round or bulged morphology.

      - For MS/MS analysis, a GFP control is mentioned but it is not indicated how this was incorporated in the data analysis. This information should be added.

      We have incorporated that in the revised methodology.

      - The information presented in Fig. 3a, e and f could be combined in one table.

      We appreciate the idea of the reviewer but we prefer a pictorial representation of the data. It allows readers to consume the information in parts, make quicker comparisons and understand trends easily.

      - Fig. 4c Wag31K20A appears smaller in size than the wild-type protein - why is this the case? Is this not a single amino acid substitution?

      Though K20A is a single amino acid substitution, it alters the mobility of Wag31 on SDS-PAGE gel. The sequence analysis of the plasmid expressing Wag31<sub>K20A</sub> doesn’t show additional mutations other than the desired K20A. The change in mobility could be due to a change in the conformation of Wag31<sub>K20A</sub> or its ability to bind to SDS or both that modify its mobility under the influence of electric field.

      - Please clarify what is contained in the first panel of fig 4e. compared to what is in the second panel.

      The first panel represents CL-Dil-Liposomes before incubation with Wag31-GFP and the second panel shows CL-Dil-Liposomes after incubation with Wag31-GFP. The third panel shows the mixture as observed in the green channel to investigate the localisation of Wag31-GFP in the liposome-protein mix. Fourth panel shows the merged of second and third.

      - The data in Fig 6d suggests higher levels of CL in the ∆wag31 compared to wild-type - how do the authors reconcile this with the MS data in Fig. 2g showing lower CL levels?

      Fig. 6d represents the distribution of CL localisation in the tested strains of mycobacteria whereas Fig. 2g shows the absolute levels of CL in various strains. We attribute greater confidence on the lipidomics data which suggests down regulation of CL species. The NAO staining and microscopy is merely for studying localization of the CL along the cell, and cannot be used to reliably quantify or equate it to CL levels. The staining using a probe such as NAO is dependent on factors such as hydrophobicity and permeability of the cell wall, which we expect to be severely altered in a Wag31 mutant. Therefore, the increased staining of NAO seen in Wag31 mutant could just be reflective of the increased uptake of the dye rather than absolute levels of CL. The specificity of staining and localization however can be expected to be unaltered.

      Reviewer #3 (Recommendations for the authors):

      Following are suggestions for improving the writing and presentation.

      • Figure 1, the meaning of the yellow arrows present in f and h should be mentioned in the figure legend.

      We have incorporated that in the revised legend. In Fig.1f, the yellow arrowhead represents the bulged pole morphology whereas in Fig. 1h, it indicates intracellular lipid inclusions.

      • Figure 7 legend refers to panels g, h, and i. However, Figure 7 only has panels a-c. The legend lacks a description of panel c.

      We have corrected the typos and the legend.

      • Figure S1, F2-R2 and F3-R3 expected sizes should be stated in the legend of the figure.

      We have updated the legends.

      • Figure S5, is this the same figure as 5e? If so, there is no need for this figure.

      We have removed Fig. S5.

      • Methods need to be written more carefully with enough details. I listed some of the concerns below.

      Detailed methodology was previously provided in the supplementary material and now we have moved it to the materials and methods in the revised manuscript.

      • Line 392, provide more details on western blotting. What is the secondary antibody? What image documentation system was used?

      We have updated the methodology.

      • Line 400, while the methods may be the same as the reference 64, authors should still provide key details such as the way samples were fixed and processed for SEM and TEM.

      We have provided a detailed description of the same in methodology in the revised version.

      • Line 437, how do authors calculate the concentration of liposome to be 10 µM? Do they possibly mean the concentration of phospholipids used to make the liposomes?

      Yes, this is the concentration of total lipids used to make liposomes. 1 μM of Wag31 or its mutants were mixed with 100 nm extruded liposomes containing 10 μm total lipid in separate Eppendorf tubes.

      • Supplemental Line 9, "turns of" should read "turns off".

      We have edited this.

      • Supplemental Line 13, define LHS and RHS.

      LHS or left hand sequence and RHS or right hand sequence refers to the upstream and downstream flanking regions of the gene of interest.

      • Supplemental Line 20, indicate the manufacturer of the microscope and type of the objective lens.

      We have added these details now.

      • Supplemental Line 31, define MeOH, or use a chemical formula like chloroform.

      MeOH is methanol. We have provided a chemical formula in the revised version.

      • Supplemental Line 53, indicate the concentration of trypsin.

      We have included that in the revised version.

      • Supplemental Line 72, g is not a unit. "30,000 g" should be "30,000x g".

      We have revised this in the manuscript.

      • Supplemental Line 114, provide more details on western blotting. What is the manufacturer of antiFLAG antibody? What is the secondary antibody? How was the antibody binding visualized? What image documentation system was used?

      We have provided these details in the revised version.

    1. eLife Assessment

      This important study reports a reanalysis of one experiment of a previously-published report to characterize the dynamics of neural population codes during visual working memory in the presence of distracting information. This paper presents solid evidence that working memory representations are dynamic and distinct from sensory representations of intervening distractions. This research will be of interest to cognitive neuroscientists working on the neural bases of visual perception and memory.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors re-analyzed a public dataset (Rademaker et al, 2019, Nature Neuroscience) which includes fMRI and behavioral data recorded while participants held an oriented grating in visual working memory (WM) and performed a delayed recall task at the end of an extended delay period. In that experiment, participants were pre-cued on each trial as to whether there would be a distracting visual stimulus presented during the delay period (filtered noise or randomly-oriented grating). In this manuscript, the authors focused on identifying whether the neural code in retinotopic cortex for remembered orientation was 'stable' over the delay period, such that the format of the code remained the same, or whether the code was dynamic, such that information was present, but encoded in an alternative format. They identify some timepoints - especially towards the beginning/end of the delay - where the multivariate activation pattern fails to generalize to other timepoints, and interpret this as evidence for a dynamic code. Additionally, the authors compare the representational format of remembered orientation in the presence vs absence of a distracting stimulus, averaged over the delay period. This analysis suggested a 'rotation' of the representational subspace between distracting orientations and remembered orientations, which may help preserve simultaneous representations of both remembered and viewed stimuli. Intriguingly, this rotation was a bit smaller for Expt 2, in which the orientation distractor had a greater behavioral impact on the participants' behavioral working memory recall performance, suggesting that more separation between subspaces is critical for preserving intact working memory representations.

      Strengths:

      (1) Direct comparisons of coding subspaces/manifolds between timepoints, task conditions, and experiments is an innovative and useful approach for understanding how neural representations are transformed to support cognition

      (2) Re-use of existing dataset substantially goes beyond the authors' previous findings by comparing geometry of representational spaces between conditions and timepoints, and by looking explicitly for dynamic neural representations

      (3) Simulations testing whether dynamic codes can be explained purely by changes in data SNR are an important contribution, as this rules out a category of explanations for the dynamic coding results observed

      Weaknesses:

      (1) Primary evidence for 'dynamic coding', especially in early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with some previous findings. However, given the simulation results, the general result that representations may change in their format appears solid, though the contribution of different trial phases remains important for considering the overall result.

      (2) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Degutis and colleagues addressed an interesting issue related to the concurrent coding of sensory percepts and visual working memory contents in visual cortices. They used generalization analyses to test whether working memory representations change over time, diverge from sensory percepts, and vary across distraction conditions. Temporal generalization analysis demonstrated that off-diagonal decoding accuracies were lower than on-diagonal decoding accuracies, regardless of the presence of intervening distractions, implying that working memory representations can change over time. They further showed that the coding space for working memory contents showed subtle but statistically significant changes over time, potentially explaining the impaired off-diagonal decoding performance. The neural coding of sensory distractions instead remained largely stable. Generalization analyses between target and distractor codes showed overlaps but were not identical. Cross-condition decodings had lower accuracies compared to within-condition decodings. Finally, within-condition decoding revealed more reliable working memory representations in the condition with intervening random noises compared to cross-condition decoding using a trained classifier on data from the no-distraction condition, indicating a change in the VWM format between the noise distractor and no-distractor trials.

      Strengths:

      This paper demonstrates a clever use of generalization analysis to show changes in the neural codes of working memory contents across time and distraction conditions. It provides some insights into the differences between representations of working memory and sensory percepts, and how they can potentially coexist in overlapping brain regions.

      Comments on revisions:

      I appreciate the authors' efforts in addressing my previous concerns. The inclusion of additional analyses and data has strengthened the paper. I have no further concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Weaknesses:

      (1) Only Experiment 1 of Rademaker et al (2019) is reanalyzed. The previous study included another experiment (Expt 2) using different types of distractors which did result in distractor-related costs to neural and behavioral measures of working memory. The Rademaker et al (2019) study uses these two results to conclude that neural WM representations are protected from distraction when distraction does not impact behavior, but conditions that do impact behavior also impact neural WM representations. Considering this previous result is critical for relating the present manuscript's results to the previous findings, it seems necessary to address Experiment 2's data in the present work

      We thank the reviewer for the proposal to analyze Experiment 2 where subjects completed the same type of visual working memory task, but instead had either a flashing orientation distractor or a naturalistic (gazebo or face) distractor present during two-thirds of the trials. As the reviewer points out, unlike Experiment 1, these two conditions in Experiment 2 had a behavioral impact on recall accuracy, when compared to the blank delay. We have now run the temporal cross-decoding analysis, temporally-stable neural subspace analysis, and condition cross-decoding analysis in Experiment 2. The results from the stable subspace analysis are present in Figure 3, while the results from the temporal cross-decoding analysis and condition cross-decoding analysis are present in the Supplementary Data.

      First, we are unable to draw strong conclusions from the temporal cross-decoding analysis, as the decoding accuracies across time in Experiment 2 are much lower compared to Experiment 1. In some ROIs of the naturalistic distractor condition we see that some diagonal elements are not part of the above-chance decoding cluster, making it difficult to draw any conclusions regarding dynamic clusters. We do see some dynamic coding in the naturalistic condition in V3 where the off-diagonals do not show above-chance decoding. Since the temporal cross-decoding provides low accuracies, we do not examine the dynamics of neural subspaces across time.

      We do, however, run the stable subspace analysis on the flashing orientation distractor condition. Just like in Experiment 1, we examine temporally stable target and distractor subspaces. When projecting the distractor onto the working memory target subspace, we see a higher overlap between the two as compared to Experiment 1. A similar pattern is seen also when projecting the target onto the distractor subspace. We still see an above-chance principal angle between the target and distractor; however, this angle is qualitatively smaller compared to Experiment 1. This shows that the degree of separation between the two neural subspaces is impacted by behavioral performance during recall.

      (2) Primary evidence for 'dynamic coding', especially in the early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with previous findings

      We agree with the reviewer that we primarily see dynamic coding between the encoding/maintenance and at the end of the maintenance periods, implying the WM representations are stable in most ROIs. The only place where we argue that we might see more dynamic coding during the delay itself is in V1 during the noise distractor trials in Experiment 1.

      (3) Dynamicism index used in Figure 1f quantifies the proportion of off-diagonal cells with significant differences in decoding performance from the diagonal cell. It's unclear why the proportion of time points is the best metric, rather than something like a change in decoding accuracy. This is addressed in the subsequent analysis considering coding subspaces, but the utility of the Figure 1f analysis remains weakly justified.

      We agree that other metrics can also provide a summary of dynamics; here, the dynamicism index just acts as a summary visualizing the dynamic elements. It offers an intuitive way to visualize peaks and troughs of the dynamic code across the extent of the trial.

      (4) There is no report of how much total variance is explained by the two PCs defining the subspaces of interest in each condition, and timepoint. It could be the case that the first two principal components in one condition (e.g., sensory distractor) explain less variance than the first two principal components of another condition.

      We thank the reviewer for this comment. We have now included the percent variance explained for the two PCs in both the temporally-stable target and distractor subspace and the dynamic subspace analysis. The percent-explained is comparable across analyses; the first PC ranges from 43-50% and the second ranges from 28-37%. The PCs within each analysis (dynamic no-distractor, orientation and noise distractor; temporally-stable target and distractor) are even closer in range (Figure 2c and 3d).

      (5) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

      We thank the reviewer for the comments. FCA is a linear function of the angular error that uses the following equation:

      We think that the FCA does not obfuscate the results, but instead provides an intuitive scale where 0% accuracy corresponds to a 180° error, 50% to a 90° error and so on. This also makes it easy to reverse-calculate the absolute error if need be. Our lab has previously used this method in other neuroimaging papers with continuous variables (Barbieri et al. 2023, Weber et al. 2024).

      We do, however, agree that “% decoding accuracy” does not provide an accurate reflection of the metric used. We have thus now changed “% decoding accuracy” to “Accuracy (% FCA)”.

      (6) This report does not make use of behavioral performance data in the Rademaker et al (2019) dataset.

      We have now analyzed Experiment 2 which, as previously mentioned by the reviewer and unlike Experiment 1, showed a decrease in recall accuracy during the two distractor conditions. We address the results from Experiment 2 in a previous response (please see Weaknesses 1).

      We do not, however, relate single subject behavioral performance to neural measurements, as we do not think there is enough power to do so with a small number of subjects in both Experiment 1 and 2. 

      (7) Given there were observed differences between individual retinotopic ROIs in the temporal cross-decoding analyses shown in Figure 1, the lack of data presented for the subspace analyses for the corresponding individual ROIs is a weakness

      We have now included an additional supplementary figure that shows individual plots of each ROI for the temporally stable subspace analysis for both Experiment 1 and Experiment 2 (Supplementary Figure 5). 

      Reviewer #1 (Recommendations For The Authors):

      (1) Is there any relationship between stable/dynamic coding properties and aspects of behavioral performance? This seems like a major missed opportunity to better understand the behavioral relevance or importance of the proposed dynamic and orthogonal coding schemes. For example, is it the case that participants who have more orthogonal coding subspaces between orientation distractor and remembered orientation show less of a behavioral consequence to distracting orientations? Less induced bias? I know these differences weren't significant at the group level in the original study, but maybe individual variability in the metrics of this study can explain differences in performance between participants in the reported dataset

      As mentioned in the previous response, we do not run individual correlations between dynamic or orthogonal coding metrics and behavioral performance, because of the small number of subjects in both experiments. We believe that for a brain-behavior correlation between average behavioral error of subjects and an average brain measure, we would need a larger sample size.  

      (2) The voxel selection procedure differs from the original study. The authors should add additional detail about the number of voxels included in their analyses, and how this number of voxels compares to that used in the original study.

      We have now added a figure summarizing the number of voxels selected across participants. We do select fewer voxels compared to Rademaker et al. 2019 (see their Supplementary Tables 9 and 10 and our Supplementary Figure 8). For example we have ~500 voxels on average in V1 in Experiment 1, while the original study had ~1000. As mentioned in the methods, we aimed to select voxels that reliably responded to both the perception localizer conditions and the working memory trials.

      (3) Lines 428-436 specify details about how data is rescaled prior to decoding. The procedure seems to estimate rescaling factors according to some aspect of the training data, and then apply this rescaling to the training and testing data. Is there a possibility of leakage here? That is - do aspects of the training data impact aspects of the testing data, and could a decoder pick up on such leakage to change decoding? It seems this is performed for each training/testing timepoint pair, and so the temporal unfolding of results may depend on this analysis choice.

      Thank you for the suggestion. To prevent data leakage, the mean and standard deviation are computed exclusively from the training set. These scaling parameters are then applied to the test set, ensuring that no information from the test set influences the training process. This transformation simply adjusts the test set to the same scale as the training data, without exposing the model to unseen test data during training.

      (4) Figure 1d, V1: it looks like the 'dynamics' are a bit non-symmetric - perhaps the authors could comment on this detail of the results? Why would we expect there would be a dynamic cluster on one side of the diagonal, but not the other? Given that this region, condition is the primary evidence for a dynamic code that's not related to the beginning/end of delay (see other comments), figuring this out is of particular importance.

      We thank the reviewer for this question. We think that this is just due to small numerical differences in the upper and lower triangles of the matrix, rather than a neuroscientifically interesting effect. However, this is only a speculative observation.

      (5) I think it's important to address the issue I raised in "weaknesses" about variance explained by the top N principal components in each condition. What are we supposed to learn from data projected into subspaces fit to different conditions if the subspaces themselves are differently useful?

      Thank you, this has now been addressed in a previous comment (please see Weakness 4). 

      Reviewer #2:

      Weaknesses:

      (1) An alternative interpretation of the temporal dynamic pattern is that working memory representations become less reliable over time. As shown by the authors in Figure 1c and Figure 4a, the on-diagonal decoding accuracy generally decreased over time. This implies that the signal-to-noise ratio was decreasing over time. Classifiers trained with data of relatively higher SNR and lower SNR may rely on different features, leading to poor generalization performance. This issue should be addressed in the paper.

      We thank the reviewer for raising this issue and we have now run three simulations that aim to address whether a changing SNR across time might create dynamic clusters. 

      In the first simulation we created a dataset of 200 voxels that have a sine or cosine response function to orientations between 1° to 180°, the same orientations as the remembered target. A circular shift is applied to each voxel to vary preferred (or maximal) responses of each simulated voxel. We then assess the decoding performance under different SNR conditions during training and testing. For each of the seven iterations we selected 108 responses (out of 180) to train on and 108 to test on. To increase variability the selected trials differed in each iteration. Random white noise was applied to the data and thus the SNR was independently scaled according to the specified levels for train and test data. We then use the same pSVR decoder as in the temporal cross decoding analysis to train and test. 

      The second and third simulations more directly address whether increased noise levels  would induce the decoder to rely on different features of the no-distractor and noise distractor data. We use empirical data from the primary visual cortex (V1; where dynamic coding was seen in the noise distractor trials) under the no-distractor and noise distractor conditions for the second and third simulations, respectively. Data from time points 5.6–8.8 seconds after stimulus onset are averaged across five TRs. As in the first simulation, SNR is systematically manipulated by adding white noise. Additionally, to see whether the initial decrease in SNR and subsequent increase would result in dynamic coding clusters, we initially increased and subsequently decreased the amplitude of added noise. The same pSVR decoder was used to train and test on the data with different levels of added noise.

      We see an absence of dynamic elements in the SNR cross-decoding matrices, as the decoding accuracy primarily depends on the training data rather than test data. This results in some off-diagonal values in the decoding matrix that are higher, rather than smaller, than corresponding on-diagonal elements.

      We have now added a Methods section explaining the simulations in more detail and Supplementary Figure 9 showing the SNR cross-decoding matrices. 

      (2) The paper tests against a strong version of stable coding, where neural spaces representing WM contents must remain identical over time. In this version, any changes in the neural space will be evidence of dynamic coding. As the paper acknowledges, there is already ample evidence arguing against this possibility. However, the evidence provided here (dynamic coding cluster, angle between coding spaces) is not as strong as what prior studies have shown for meaningful transformations in neural coding. For instance, the principal angle between coding spaces over time was smaller than 8 degrees, and around 7 degrees between sensory distractors and WM contents. This suggests that the coding space for WM was largely overlapping across time and with that for sensory distractors. Therefore, the major conclusion that working memory contents are dynamically coded is not well-supported by the presented results.

      We thank the reviewer for this comment. The principal angles we calculate are above-baseline, meaning that we subtract the within-subspace principal angles from the between-subspace principal angles and take the average. Thus a 7 degree difference does not imply that there are only 7 degrees separating e.g. the sensory distractor from the target; it just indicates that the separation is 7 degrees above chance. 

      (3) Relatedly, the main conclusions, such as "VWM code in several visual regions did not generalize well between different time points" and "VWM and feature-matching sensory distractors are encoded in separable coding spaces" are somewhat subjective given that cross-condition generalization analyses consistently showed above chance-level performance. These results could be interpreted as evidence of stable coding. The authors should use more objective descriptions, such as 'temporal generalization decoding showed reduced decoding accuracy in off-diagonals compared to on-diagonals.

      Thank you, we agree that our previous claims might have been too strong. We have now toned down our statements in the Abstract and use “did not fully generalize” and “VWM and feature-matching sensory distractors are encoded in coding spaces that do not fully overlap.”

      Reviewer #2 (Recommendations For The Authors):

      Weakness 1 can potentially be addressed with data simulations that fix the signal pattern, vary the noise pattern, and perform the same temporal generalization analysis to test whether changes in SNR can lead to seemingly dynamic coding formats.

      Thank you for the great suggestion. We have now run the suggested simulations. Please see above (response to Weakness 1).

      There are mismatches in the statistical symbols shown in Figure 4 and Supplementary Table 2. It seems that there was a swap between the symbols for the noise between-condition and noise within-condition.

      Thank you, this has now been fixed.

    1. eLife Assessment

      This study describes an improved adaptive sampling approach, multiple-walker Supervised Molecular Dynamics (mwSuMD), and its application to G protein-coupled receptors (GPCRs), which are the most abundant membrane proteins and key targets for drug discovery. The manuscript provides solid evidence that the mwSuMD approach can assist in the sampling of complex binding processes, leading to useful findings for GPCR activity, including resolution of interactions not seen experimentally. The method has the potential to have broad applicability in structural biology and pharmacology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

    3. Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.<br /> Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.<br /> MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);<br /> b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDP-bound Gs protein;<br /> c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;<br /> d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.<br /> The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.<br /> The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.<br /> While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

    4. Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. eLife Assessment

      In Plasmodium male gametocytes, rapid nuclear division occurs with an intact nuclear envelope, requiring precise coordination between nuclear and cytoplasmic events to ensure proper packaging of each nucleus into a developing gamete. This valuable study characterizes two proteins involved in the formation of Plasmodium berghei male gametes. By integrating live-cell imaging, ultrastructural expansion microscopy, and proteomics, this study convincingly identifies SUN1 and its interaction partner ALLAN as crucial nuclear envelope components in male gametogenesis. A role for SUN1 in membrane dynamics and lipid metabolism is less well supported. The results are of interest for general cell biologists working on unusual mitosis pathways.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Activated male Plasmodium gametocytes undergo very rapid nuclear division, while keeping the nuclear envelope intact. There is interest in how events inside the nucleus are co-ordinated with events in the parasite cytoplasm, to ensure that each nucleus is packaged into a nascent male gamete.

      This manuscript by Zeeshan et al describes the organisation of a nuclear membrane bridging protein, SUN1, during nuclear division. SUN1 is expected from studies in other organisms to be a component of a bridging complex (LINC) that connects the inner nuclear membrane to the outer nuclear membrane, and from there to the cytoplasmic microtubule-organising centres, the centrosome and the basal body.

      The authors show that knockout of the SUN1 in gametocytes leads to severe disruption of the mitotic spindle and failure of the basal bodies to segregate. The authors show convincingly that functional SUN1 is required for male gamete formation and subsequent oocyst development.

      The authors identified several SUN1-interacting proteins, thus providing information about the nuclear membrane bridging machinery.

      Strengths:

      The authors have used state of the art imaging, genetic manipulation and immunoprecipitation approaches.

      Weaknesses:

      Technical limitations of some of the methods used make it difficult to interpret some of the micrographs.

      From studies in other organisms, a protein called KASH is a critical component the bridging complex (LINC). That is, KASH links SUN1 to the outer nuclear membrane. The authors undertook a gene sequence analysis that reveals that Plasmodium lacks a KASH homologue. Thus, further work is needed to identify the functional equivalent of KASH, to understand bridging machinery in Plasmodium.

      Comments on revised version:

      The authors have addressed the comments and suggestions that I provided as part of a Review Commons assessment.

    3. Reviewer #2 (Public review):

      Zeeshan et al. investigate the function of the protein SUN1, a proposed nuclear envelope protein linking nuclear and cytoplasmic cytoskeleton, during the rapid male gametogenesis of the rodent malaria parasite Plasmodium berghei. They reveal that SUN1 localises to the nuclear envelope (NE) in male and female gametes and show that the male NE has unexpectedly high dynamics during the rapid process of gametogenesis. Using expansion microscopy, the authors find that SUN1 is enriched at the neck of the bipartite MTOC that links the intranuclear spindle to the basal bodies of the cytoplasmic axonemes. Upon deletion of SUN1, the basal bodies of the eight axonemes fail to segregate, no spindle is formed, and emerging gametes are anucleated, leading to a complete block in transmission. By interactomics the authors identify a divergent allantoicase-like protein, ALLAN, as a main interaction partner of SUN1 and further show that ALLAN deletion largely phenocopies the effect of SUN1.

      Overall, the authors use an extensive array of fluorescence and electron microscopy techniques as well as interactomics to convincingly demonstrate that SUN1 and ALLAN play a role in maintaining the structural integrity of the bipartite MTOC during the rapid rounds of endomitosis in male gametogenesis.

      Two suggestions for improvement of the work remain:

      (1) Lipidomic analysis of WT and SUN1-knockout gametocytes before and after activation resulted in only minor changes in some lipid species. Without statistical analysis, it remains unclear if these changes are statistically significant and not rather due to expected biological variability. While the authors clearly toned down their conclusions in the revised manuscript, some phrasings in the results and the discussion still suggest that gametocyte activation and/or SUN1-knockout affects lipid composition. Similarly, some phrases suggest that SUN1 is responsible for the observed loops and folds in the NE and that SUN1 KO affects the NE dynamics. Currently, I do not think that the data supports these statements.

      (2) It is interesting to note that ALLAN has a much more specific localisation to basal bodies than SUN1, which is located to the entire nuclear envelope. Knock out of ALLAN also exhibits a milder (but still striking) phenotype than knockout of SUN1. These observations suggest that SUN1 has additional roles in male gametogenesis besides its interaction with ALLAN, which could be discussed a bit more.

      This study uses extensive microscopy and genetics to characterise an unusual SUN1-ALLAN complex, thus providing new insights into the molecular events during Plasmodium male gametogenesis, especially how the intranuclear events (spindle formation and mitosis) are linked to the cytoplasmic separation of the axonemes. The characterisation of the mutants reveals an interesting phenotype, showing that SUN1 and ALLAN are localised to and maintain the neck region of the bipartite MTOC. The authors here confirm and expand the previous knowledge about SUN1 in P. berghei, adding more detail to its localisation and dynamics, and further characterise the interaction partner ALLAN. Given the evolutionary divergence of Plasmodium, these results are interesting not only for parasitologists, but also for more general cell biologists.

    4. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Minor comments:

      In the results section (lines 498-499), the authors describe free kinetochores in many cells without associated spindle microtubules. However, some nuclei appear to have kinetochores, as presented in Figure 6. Could the authors clarify how this conclusion was derived using transmission electron microscopy (TEM) without serial sectioning, as this is not explicitly mentioned in the materials and methods?

      We observed free kinetochores in the ALLAN-KO parasites with no associated spindle microtubules (see Fig. 6Gh), while kinetochores are attached to spindle microtubules in WT-GFP cells (see Fig. 6Gc). To provide further evidence we analysed additional images and found that ALLAN-KO cells have free kinetochores in the centre of nucleus, unattached to spindle microtubules. We provide some more images clearly showing free kinetochores in these cells (new supplementary Fig. S11).

      However, in the ALLAN mutant, this difference is not absolute: in a search of over 50 cells, one example of a cell with a “normal” nuclear spindle and attached kinetochores was observed.

      The use of serial sectioning has limitations for examining small structures like kinetochores in whole cells. The limitations of the various techniques (for example, SBF-SEM vs tomography) are highlighted in our previous study (Hair et al 2022; PMID: 38092766), and we consider that examining a population of randomly sectioned cells provides a better understanding of the overall incidence of specific features.

      Discussion Section:

      Could the authors expand on why SUN1 and ALLAN are not required during asexual replication, even though they play essential roles during male gametogenesis?

      We observed no phenotype in asexual blood stage parasites associated with the sun1 and allan gene deletions. Several other Plasmodium berghei gene knockout parasites with a phenotype in sexual stages, for example CDPK4 (PMID: 15137943), SRPK (PMID: 20951971), PPKL (PMID: 23028336) and kinesin-5 (PMID: 33154955) have no phenotype in blood stages, so perhaps this is not surprising. One explanation may be the substantial differences in the mode of cell division between these two stages. Asexual blood stages produce new progeny (merozoites) over 24 hours with closed mitosis and asynchronous karyokinesis during schizogony, while male gametogenesis is a rapid process, completed within 15 min to produce eight flagellated gametes. During male gametogenesis the nuclear envelope must expand to accommodate the increased DNA content (from 1N to 8N) before cytokinesis. Furthermore, male gametogenesis is the only stage of the life cycle to make flagella, and axonemes must be assembled in the cytoplasm to produce the flagellated motile male gametes at the end of the process. Thus, these two stages of parasite development have some very different and specific features.

      Lines 611-613 states: "These loops serve as structural hubs for spindle assembly and kinetochore attachment at the nuclear MTOC, separating nuclear and cytoplasmic compartments." Could the authors elaborate on the evidence supporting this statement?

      We observed the loops/folds in the nuclear envelope (NE) as revealed by SUN1-GFP and 3D TEM images during male gametogenesis. These folds/loops occur mainly in the vicinity of the nuclear MTOC where the spindles are assembled (as visualised by EB1 fluorescence) and attached to kinetochores (as visualised by NDC80 fluorescence). These loops/folds may form due to the contraction of the spindle pole back to the nuclear periphery, inducing distortion of the NE. Since there is no physical segregation of chromosomes during the three rounds of mitosis (DNA increasing from 1N to 8N), we suggest that these folds provide additional space for spindle and kinetochore dynamics within an intact NE to maintain separation from the cytoplasm (as shown by location of kinesin-8B).

      In lines 621-622, the authors suggest that ALLAN may have a broader role in NE remodelling across the parasite's lifecycle. Could they reflect on or remind readers of the finding that ALLAN is not essential during the asexual stage?

      ALLAN-GFP is expressed throughout the parasite life cycle but as the reviewer points out, a functional role is more pronounced during male gametogenesis. This does not mean that it has no role at other stages of the life cycle even if there is no obvious phenotype following deletion of the gene during the asexual blood stage. The fact that ALLAN is not essential during the asexual blood stage is noted in lines 628-29.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Introduction

      Line 63: The authors stat: "NE is integral to mitosis, supporting spindle formation, kinetochore attachment, and chromosome segregation..". Seemingly at odds, they also say (Line 69) that 'open' "mitosis is "characterized by complete NE disassembly".

      The authors could explain better the ideas presented in their quoted review from Dey and Baum, which points out that truly 'open' and 'closed' topologies may not exist and that even in 'open' mitosis, remnants of the NE may help support the mitotic spindle.

      We have modified the sentence in which we discuss current opinions about ‘open’ and ‘closed’ mitosis. It is believed that there is no complete disassembly of the NE during open mitosis and no completely intact NE during closed mitosis, respectively. In fact, the NE plays a critical role in the different modes of mitosis during MTOC organisation and spindle dynamics. Please see the modified lines 64-71.

      Results

      Fig 7 is the final figure; but would be more useful upfront.

      We have provided a new introductory figure (Fig 1) showing a schematic of conventional /canonical LINC complexes and evidence of SUN protein functions in model eukaryotes and compare them to what is known in apicomplexans.

      Fig 1D. The authors generated a C-terminal GFP-tagged SUN1 transfectants and used ultrastructure expansion microscopy (U-ExM) and structured illumination microscopy (SIM) to examine SUN1-GFP in male gametocytes post-activation. The immuno-labelling of SUN1-GFP in these fixed cells appears very different to the live cell images of SUN1-GFP. The labelling profile comprises distinct punctate structures (particularly in the U-ExM images), suggesting that paraformaldehyde fixation process, followed by the addition of the primary and secondary antibodies has caused coalescing of the SUN1-GFP signal into particular regions within the NE.

      We agree with the reviewer. Fixation with paraformaldehyde (PFA) results in a coalescence of the SUN1-GFP signal. We have also tried methanol fixation (see new Fig. S2), but a similar problem was encountered.

      Given these fixation issues, the suggestion that the SUN1-GFP signal is concentrated at the BB/ nuclear MTOC and "enriched near spindle poles" needs further support.

      These statements seem at odd with the data for live cell imaging where the SUN1-GFP seems evenly distributed around the nuclear periphery. Can the observation be quantitated by calculating the percentage of BB/ nuclear MTOC structures with associated SUN1-GFP puncta? If not, I am not convinced these data help understand the molecular events.

      We agree with the reviewer that whilst the live cell imaging showed an even distribution of SUN1-GFP signal, after fixation with either PFA or methanol, then SUN1-GFP puncta are observed in addition to the peripheral location around the stained DNA (Hoechst) (See Fig. S2; puncta are indicated by arrows). These SUN1-GFP labelled puncta were observed at the junction of the nuclear MTOC and the basal body (Fig. 2F). Quantification of the distribution showed that these SUN1-GFP puncta are associated with nuclear MTOC in more than 90 % of cells (18 cells examined). Live cell imaging of the dual labelled parasites; SUN1xkinesin-8B (Fig. 2H) and SUN1x EB1 (Fig. 2I) provides further support for the association of SUN1-GFP puncta with BB (kinesin-8B) /nuclear MTOC (EB1).

      The authors then generated dual transfectants and examined the relative locations of different markers in live cells. These data are more informative.

      The authors state; " ..SUN1-GFP marked the NE with strong signals located near the nuclear MTOCs situated between the BB tetrads". The nuclear MTOCs are not labelled in this experiment. The SUN1-GFP signal between the kinesin-8B puncta is evident as small puncta on regions of NE distortion. I would prefer to not describe this signal as "strong". The signal is stronger in other regions of the NE.

      We have modified the sentence on line 213 to accommodate this suggestion.

      Line 219. The authors state; "..SUN1-GFP is partially colocalized with spindle poles as indicated by EB1,.. it shows no overlap with kinetochores (NDC80)." The authors should provide an analysis of the level of overlap at a pixel by pixel level to support this statement.

      We now provide the overlap at a pixel-by-pixel level for representative images, and we have quantified more cells (n>30), as documented in the new Fig. S4A. We have also modified the sentence on line 219 to reflect these additions.

      The SUN1 construct is C-terminally GFP-tagged. By analogy with human SUN1, the C-terminal SUN domain is expected to be in the NE lumen. That is in a different compartment to EB1, which is located in the nuclear lumen (on the spindle). Thus, the overlap of signal is expected to be minimal.

      We agree with the reviewer that the overlap between EB1 and Sun1 signals is expected to be minimal. We have quantified the data and included it in Supplementary Fig. S4A.

      Similarly, given that EB1 and NDC80 are known to occupy overlapping locations on the spindle, it seems unlikely that SUN1 can overlap with one and not the other.

      We agree with the reviewer’s analysis that EB1 and NDC80 occupy overlapping locations on the spindle, although the length of NDC80 is less at the ends of spindles (see Author response image 1A) as shown in our previous study where we compared the locations of two spindle proteins, ARK2 and EB1, with that of NDC80 (Zeeshan et al, 2022; PMID: 37704606). In the present study we observed that Sun1-GFP partially overlaps with EB1 at the ends of the spindle, but not with NDC80. Please see Author response image 1B.

      Author response image 1.

      I note on Line 609, the authors state "Our study demonstrates that SUN1 is primarily localized to the nuclear side of the NE.." As per Fig 7D, and as discussed above, the bulk of the protein, including the SUN1 domain, is located in the space between the INM and the ONM.

      We appreciate the reviewer’s correction; we have now modified the sentence to indicate that the protein is largely localized in the space between the INM and the ONM on line 617.

      Interestingly, as the authors point out, nuclear membrane loops are evident around EB1 and NDC80 focal regions. The data suggests that the contraction of the spindle pole back to the nuclear periphery induces distortion of the NE.

      We agree with the reviewer’s suggestion that the data indicate that contraction of spindle poles back to the nuclear periphery may induce distortion of the NE.

      The author should discuss further the overlap of findings of this study with that from a recent manuscript (https://doi.org/10.1016/j.cels.2024.10.008). That Sayers et al. study identified a complex of SUN1 and ALLC1 as essential for male fertility in P. berghei. Sayers et al. also provide evidence that this complex particulate in the linkage of the MTOC to the NE and is needed for correct mitotic spindle formation during male gametogenesis.

      We thank the reviewer for this suggestion. The study by Sayers et al, (2024) was published while our manuscript was under preparation. It was interesting to see that these complementary studies have similar findings about the role of SUN1 and the novel complex of SUN1-ALLAN. Our study contains a more detailed, in-depth analysis both by Expansion and TEM of SUN1. We include additional studies on the role of ALLAN.  We discuss the overlap in the findings of the two studies in lines 590-605.

      While the work is interesting, the conclusions may need to be tempered. The authors suggestion that in the absence of KASH-domain proteins, the SUN1-ALLAN complex forms a non-canonical LINC complex (that is, a connection across the NE), that "achieves precise nuclear and cytoskeletal coordination".

      We have toned down the wording of this conclusion in lines 665-677.

      In other organisms, KASH interacts with the C-terminal domain on SUN1, which as mentioned above is located between the INM and ONM. By contrast, ALLAN interacts with the N-terminal domain of SUN1, which is located in the nuclear lumen. The SUN1-ALLAN interaction is clearly of interest, and ALLAN might replace some of the roles of lamins. However, the protein that functionally replaces KASH (i.e. links SUN1 to the ONM) remains unidentified.

      We agree with reviewer, and future studies will need to focus on identifying the KASH replacement that links SUN1 to the ONM.

      It may also be premature to suggest that the SUN1-ALLAN complex is promising target for blocking malaria transmission. How would it be targeted?

      We have deleted the sentence that raised this suggestion.

      While the above datasets are interesting and internally consistent, there are two other aspects of the manuscript that need further development before they can usefully contribute to the molecular story.

      The authors undertook a transcriptomic analysis of Δsun1 and WT gametocytes, at 8 and 30 min post-activation, revealing moderate changes (~2-fold change) in different genes. GO-based analysis suggested up-regulation of genes involved in lipid metabolism. Given the modest changes, it may not be correct to conclude that "lipid metabolism and microtubule function may be critical functions for gametogenesis that can be perturbed by sun1 deletion." These changes may simply be a consequence of the stalled male gametocyte development.

      Following the reviewer’s suggestion we have moved these data to the supplementary information (Fig. S5D-I) and toned down their discussion in the results and discussion sections.

      The authors have then undertaken a detailed lipid analysis of the Δsun1 and WT gametocytes, before and after activation. Substantial changes in lipid metabolites might not be expected in such a short period of time. And indeed, the changes appear minimal. Similarly, there are only minor changes in a few lipid sub-classes between Δsun1 and WT gametocytes. In my opinion, the data are not sufficient to support the authors conclusion that "SUN1 plays a crucial role, linking lipid metabolism to NE remodelling and gamete formation."

      In agreement with the reviewer’s comments we have moved  these data to supplementary information (Fig. S6) and substantially toned down the conclusions based on these findings.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Major comments:

      My main concern with this manuscript is that the authors do conclude not only that SUN1 is important for spindle formation and basal body segregation, but also that it influences for lipid metabolism and NE dynamics. I don't think the data supports this conclusion, for several reasons listed below. I would suggest to remove this claim from the manuscript or at least tone it down unless more supporting data are provided, in particular showing any change in NE dynamics in the SUN1-KO. Instead I would recommend to focus on the more interesting role of SUN1-ALLAN in bipartite MTOC organisation, which likely explains all observed phenotypes (including those in later stages of the parasite life cycle). In addition, some aspects of the knockout phenotype should be quantified to a bit deeper level.

      In more detail:

      - The lipidomics analysis is clearly the weakest point of the manuscript: The authors state that there are significant changes in some lipid populations between WT and sun1-KO, and between activated and non-activated cells, yet no statistical analysis is shown and the error bars are quite high compared to only minor changes in the means. For some discussed lipids, the result text does not match the graphs, e.g. PA, where the increase upon activation is more pronounced in the SUN1-KO vs WT (contrary to the text), or MAG, which is reduced in the SUN1-KO vs WT (contrary to the text). I don't see the discussed changes in arachidonic acid levels and myristic acid levels in the data either. Even if the authors find after analysis some statistically significant differences between some groups, they should carefully discuss the biological significance of these differences. As it is, I do not think the presented data warrants the conclusion that deletion of SUN1 changes lipid homeostasis, but rather shows that overall lipid homeostasis is not majorly affected by gametogenesis or SUN1 deletion. As a minor comment, if you decide to keep the lipidomics analysis in the manuscript, please state how many replicates were done.

      As detailed above we have moved the lipidomics data to supplementary information (Fig. S6) and substantially toned down the discussion of these data in the results and discussion sections.

      - I can't quite follow the logic why the authors performed transcriptomic analysis of the SUN1 and how they chose their time points. Their data up to this point indicate that SUN1 has a structural or coordinating role in the bipartite MTOC during male gametogenesis. Based on that it is rather unlikely that SUN1 KO directly leads to transcriptional changes within the 8 min of exflagellation. Isn't it more likely that transcriptional differences are purely a downstream effect of incomplete/failed gametogenesis? This is particularly true for the comparison at 30 min, which compares a mixture of exflagellated/emerged gametes and zygotes in WT to a mixture of aberrant, arrested gametes in the knockout, which will likely not give any meaningful insight. The by far most significant GO-term is then also nuclear-transcribed mRNA catabolic process, which is likely not related at all to SUN1 function (and the authors do not even comment on this in the main text). I would therefore suggest removing the 30 min data set from this manuscript. As a minor point, I would suggest highlighting some of the top de-regulated gene IDs in the volcano plots and stating their function. Also, please state how you prepared the cells for the transcriptomes and in how many replicates this was done.

      As suggested by the reviewer we have removed the 30 min post activation data from the manuscript. We have also moved the rest of the transcriptomics data to supplementary information (Fig. S5) and toned down the presentation of this aspect of the work in the results and discussion sections.

      - Live-cell imaging of SUN1-GFP does nicely visualise the NE during gametogenesis, showing a highly dynamic NE forming loops and folds, which is very exciting to see. It would be beneficial to also show a video from the life-cell imaging.

      We have now added videos to the manuscript as suggested by the reviewer. Please see the supplementary Videos S1 and S2.

      In their discussion, the authors state multiple times that NE dynamics are changed upon SUN1 KO. Yet, they do not provide data supporting this claim, i.e. that the extended loops and folds found in the nuclear envelope during gametogenesis are affected in any way by the knockout of SUN1 or ALLAN. What happens to the NE in absence of SUN1? Are there less loops and folds? In absence of a reliable NE marker this may not be entirely easy to address, but at least some SBF-SEM images of the sun1-KO gametocytes could provide insight.

      It was difficult to provide SBF-SEM images as that work is beyond the scope of this manuscript. We will consider this approach in our future work. We re-examined many of our TEM images of SUN1-KO and ALLAN-KO parasites and did find some micrographs showing aberrant nuclear membrane folding (<5%) (Please see Author response image 2). However, we also observed similar structures in some of the WT-GFP samples (<5%), so we do not think this is a strong phenotype of the SUN1 or ALLAN mutants.

      Author response image 2.

       

      - I think the exciting part of the manuscript is the cell biological role of SUN1 on male gametogenesis, which could be carved out a bit more by a more detailed phenotyping. Specifically it would be good to quantify

      (1) If DNA replication to an octoploid state still occurs in SUN1-KO and ALLAN-KO,

      DNA replication is not affected in the SUN1-KO and ALLAN-KO mutants: DNA content increases to 8N (data added in Fig. 3J and Fig. S10F).

      (2) The proportion of anucleated gametes in WT and the KO lines

      We have added these data in Fig. 3K and Fig. S10G

      (3) A quantification of the BB clustering phenotype (in which proportion of cells do the authors see this phenotype). This could be addressed by simple fixed immunofluorescence images of the respective WT/KO lines at various time points after activation (or possibly by reanalysis of the already obtained images) and would really improve the manuscript.

      We have reanalysed the BB clustering phenotype and added the quantitative data in Fig. 4E and Fig. S7.

      Especially the claim that emerged SUN1-KO gametes lack a nucleus is currently only based on single slices of few TEM cells and would benefit from a more thorough quantification in both SUN1- and ALLAN-Kos

      We have examined many microgametes (100+ sections). In WT parasites a small proportion of gametes can appear to lack a nucleus if it does not extend all the way to the apical and basal ends (Hair et al. 2022). However, the proportion of microgametes that appear to lack a nucleus (no nucleus seen in any section) was much higher in the SUN1 mutant. In contrast, this difference was not as clear cut in the ALLAN mutant with a small proportion of intact (with axoneme and nucleus) microgametes being observed.

      We have done additional analysis of male gametes, looking for the presence of the nucleus by live cell imaging after DNA staining with Hoechst. These data are added in Fig. 3K (for Sun1-KO) and Fig. S10G (for Allan-KO).

      - The TEM suggests that in the SUN1-KO, kinetochores are free in the nucleus. Are all kinetochores free or do some still associate to a (minor/incorrectly formed) spindle? The authors could address this by tagging NDC80 in the KO lines.

      Our observation and quantification of the data indicated that 100% of kinetochores were attached to spindle microtubules and that 0% were unattached kinetochores in the WT parasites. However, the exact opposite was found for the SUN1 mutant with 100% unattached kinetochores and 0% attached. The result was not quite as clear cut in the ALLAN mutant, with 98% unattached and 2% attached. An important observation was the lack of separation of the nuclear poles and any spindle formation. Spindle formation was never or very rarely observed in the mutants.

      - Finally, I think it is curious that in contrast to SUN1, ALLAN seems to be less important, with some KO parasite completing the life cycle. Maybe a more detailed phenotyping as above gives some more hints to where the phenotypic difference between the two proteins lies. I would assume some ALLAN-KO cells can still segregate the basal body. Can the authors speculate/discuss in more detail why these two proteins seems to have slightly different phenotypes?

      We agree with the reviewer. Overall, the ALLAN-KO has a less prominent phenotype than that of the Sun1-KO. The main difference is that in the ALLAN-KO mutant some basal body segregation can occur, leading to the production of some fertile microgametocytes, and ookinetes, and oocyst formation (Fig. 8). Approximately 5% of oocysts sporulated to release infective sporozoites that could infect mice in bite back experiments and complete the life cycle. In contrast the Sun1-KO mutant made no healthy oocysts, or infective sporozoites, and could not complete the life cycle in bite back experiments. We have analysed the phenotype in detail and provide quantitative data for gametocyte stages by EM and ExM in Figs. 4 and S8 (SUN1) and Figs. 7 and S11 (ALLAN). We have also performed detailed analysis of oocyst and sporozoite stages and included the data in Fig. 3 (SUN1) and S10 (ALLAN).

      Based on the location, and functional and interactome data, we think that SUN1 plays a central role in coordinating nucleoplasm and cytoplasmic events as a key component of the nuclear membrane lumen, whereas ALLAN is located in the nucleoplasm. Deleting the SUN1 gene may disrupt the connection between INM and ONM whereas the deletion of ALLAN may affect only the INM.

      Some additional points where the data is not entirely sound yet or could be improved:

      - Localisation of SUN1: There seems to be a discrepancy between SUN1-GFP location as observed by live cell microscopy, and by Expansion Microscopy (ExM), similar for ALLAN-GFP. By live-cell microscopy, the SUN1 localisation is much more evenly distributed around the NE, while the localisation in ExM is much more punctuated, and e.g. in Figure 1E seems to be within the nucleus. Do the authors have an explanation for this? Also, in Fig. 1D there are two GFP foci at the cell periphery (bottom left of the image), which I would think are not SUN1-Foci, as they seem to be outside of the cell. Is the antibody specific? Was there a negative control done for the antibody (WT cells stained with GFP antibodies after ExM)?

      High resolution SIM and expansion microscopy showed that the SUN1-GFP molecules coalesce to form puncta, in contrast to the more uniform distribution observed by live cell imaging. This apparent difference may be due to a better resolution that could not be achieved by live cell imaging. We agree with the reviewer that the two green foci are outside of the cell. As a negative control we have used WT-ANKA cells (which contain no GFP) and the anti-GFP antibody, which gave no signal. This confirms the specificity of the antibody (please see the new Fig. S3). 

      - The authors argue that SIM gave unexpected results due to PFA fixation leading to collapse of the NE loops. However, they also fix their ExM cells and their EM cells with PFA and do not observe a collapse, at least from what I see in the two presented images and in the 3D reconstruction. Is there something else different in the sample preparation?

      There was no difference in the fixation process for samples examined by SIM and ExM, but we used an anti-GFP antibody in ExM to visualise the SUN1-GFP, while in SIM the images of GFP signal were collected directly after fixation.  We used both PFA and methanol as fixative, and both methods showed a coalescing of the SUN1-GFP signal (please see the new Fig. S2 and S3).

      Can the authors trace their NE in ExM according to the NHS-Ester signal?

      We could trace the NE in the ExM by the NHS-ester signal and observed that the SUN1-GFP signal was largely coincident with the NE (Please see the new Fig. S3B).

      - Fig 2D: It would be good to not just show images of oocysts but actually quantify their size from images. Also, have the authors determined the sporozoite numbers in SUN1-KO?

      We have measured oocyst size (data added in new Fig. 3) and added the sporozoite quantification data in Fig. 3D.

      - Line 481-483: the authors state that oocyst size is reduced in ALLAN-KO but do not show the data. Please quantify oocyst size or at least show representative images. Also the drastic decrease in sporozoite numbers (Fig. 6D, E) is not mentioned in the text. Please add reference to Fig S7D when talking about the bite back data.

      We have added the oocyst size data in Fig. S10. We mention the changes in sporozoite numbers (now  shown in Fig. 7D, E), and refer to  the bite back data shown in current Fig. 7E.

      - Fig S1C, 6C: Both WB images are stitched, but this is not clearly indicated e.g. by leaving a small gap between the lanes. Also please show a loading control along with the western blots. Also there seems to be a (unspecific?) band in the control, running at the same height as Allan-GFP WB. What exactly is the control?

      We have provided the original blot showing the bands of ALLAN-GFP and SUN1-GFP. As a positive control, we used an RNA associated protein (RAP-GFP) that is highly expressed in Plasmodium and regularly used in our lab for this purpose.

      - Regarding the crossing experiment: The authors conclude from this cross that SUN1 is only needed in males, yet for this conclusion they would need to also show that a cross with a female line does not rescue the phenotype. The authors should repeat the cross with a male-deficient line to really test if the phenotype is an exclusively male phenotype. In addition, line 270-272 states that no oocysts/sporozoites were detected in sun1-ko and nek4-ko parasites. However, the figure 2E shows only oocysts, not sporozoites, and shows also that sun1-ko does form oocysts, albeit dead ones.

      We have now performed the experiment of crossing the Sun1-KO parasite line with a male deficient line (Hap2-KO) and added the data in Fig. 3I. We have added images showing sporozoites in oocysts.

      - In Fig S1 the authors show that they also generated a SUN1-mCherry line, yet they do not use it in any of the presented experiments (unless I missed it). Would it be beneficial to cross the SUN1-mCherry line with the Allan1-GFP line to test colocalisation (possibly also by expansion microscopy)?

      We did generate a SUN1-mCherry line, with the intent to cross ALLAN-GFP and SUN1-mCherry lines and observe the co-location of the proteins. Despite multiple attempts this cross was unsuccessful. This may have been due to their close proximity such that the addition of both GFP and mCherry was difficult to facilitate a proper protein-protein interaction between either of the proteins.

      - Line 498: "In a significant proportion of cells" - What was the proportion of cells, and what does significant mean in this context?

      Approximately 67% of cells showed the clumping of BBs. We have now added the numbers in Figs. 6H and S11I.

      - The authors should discuss a bit more how their work relates to the work of Sayers et al. 2024, which also identified the SUN1-ALLAN complex. The paper is cited, but only very briefly commented on.

      We have extended this discussion now in lines 590-605.

      Suggestions how to improve the writing and data presentation.

      - General presentation of microscopy images: Considering that large parts of the manuscript are based on microscopy data, their presentation could be improved. Single-channel microscopy images would benefit from being depicted in gray scale instead of color, which would make it easier to see the structures and intensities (especially for blue channels).

      Whilst we agree with the reviewer, sometimes it is difficult to see the features in the merged images. Therefore, we would like to request to be allowed to retain the colours, which can be easily followed in both individual and merged images.

      Also, it would be good to harmonize in which panels arrows are shown (e.g. Fig 1G, where some white arrows are in the SUN1-GFP panel, while others are in the merge panel, but they presumably indicate the same thing.). At the same time, Fig 1H doesn't have any with arrows, even though the figure legend states so.

      We apologise for this lack of consistency, and we have now added arrows wherever they are missing to harmonise in the presentations.

      Fig 3A and S4 show the same experiment but are coloured in different colours (NHS-Eester in green vs grey scale).

      - Are the scale bars of all expansion microscopy images adjusted for the expansion factor?

      Yes, the scale bars are adjusted accordingly.

      - The figure legends would benefit from streamlining, as they have very different style between figures (eg Fig. 6 which has a concise figure legend vs microscopy figures where figure legends are very long and describe not only the figure but the results)

      The figure legends have been streamlined, with removal of the description of results.

      - Line 155-156: The text makes it sound like the expression only happens after activation. is that the case? Are these images activated or non-activated gametocytes?

      They are expressed before activation, but the signal intensifies after activation. Images from before and after activation of gametocytes have been added in Fig. S1F.

      - Line 267: Reference to the original nek4-KO paper missing

      This reference is now included.

      - Line 301: The reference to Figure 2J seems to be a bit arbitrarily placed. Also, this schematic of lipid metabolism is never discussed in relation to the transcriptomic or lipidomic data.

      We have moved these data to supplementary information and modified the text.

      - Line 347-349 states that gametes emerged, but the referenced figure shows activated gametocytes before exflagellation.

      We have corrected the text to the start of exflagellation.

      - Line 588: Spelling mistake in SUN1-domain

      Corrected.

      - Line 726/731: i missing in anti-GFP

      Corrected.

      - Line 787-789: statement of scale bar and number of cells imaged is not at the right position in the figure legend.

      Moved to right place

      - Line 779, 783: "shades of green" should be just "green". Same goes for line 986, 989 with "shades of grey"

      Changed.

      - Line 974, 976: please correct to WT-GFP and dsun1

      Corrected.

      - Line 1041, 1044: WT-GFP instead of WTGFP.

      Corrected to WT-GFP.

      - Fig 1B, D, E, Fig S1G, H: What are the time points of imaging?

      We have added the time points to the images in these figures.

      - Fig 1D/Line 727: the scale of the scale bar on the inset is missing.

      We have added the scale bar.

      - Fig 3 E-G and 6H-J: Please indicate total number of cells/images analysed per quantification, either in the graphs themselves or in the figure legend.

      We indicate now the number of cells analysed in individual figures and also in Fig. S5C and S8C, respectively.

      - Fig 5B: What is NP

      Nuclear Pole (NP), also known as the nuclear/acentriolar MTOC (Zeeshan et al 2022; PMID: 35550346).

      - Fig S1B/D: The legend states that there is an arrow indicating the band, but there is none.

      We have added the arrow.

      - Fig S2C: Is the scale bar really the same for the zygote and the ookinete?

      We have checked this and used the same for both zygote and ookinete.

      - Fig S3C, S7C: which stages was qRT-PCR done on?

      Gametocytes activated for 8 min.

      - Fig. S3D, S7D: According to the figure legend, three independent experiments were performed. How many mice were used per experiment? It would be good to depict the individual data points instead of the bar graph. For S7D, 3 data points are depicted (one in WT, two in allan-KO), what do they mean?

      The bite back experiment was performed using 15-20 mosquitoes infected with WT-GFP and gene knockout lines to feed on one naïve mouse each, in three different experiments. We have now included the data points in the bar diagrams.

      - Fig S3: Panel letters E and G are missing

      We have updated the lettering in current Fig. S5

      - Fig 3D: Please indicate what those boxes are. I presume that these are the insets show in b, e and j, but it is never mentioned. J is not even larger than i. Also, f is quite cropped, it would be good to see the large-scale image it comes from to see where in the nucleus these kinetochores are placed. Were there unbound kinetochores found in WT?

      We mention the boxes in the figure legends. It is rare to find unbound kinetochores in WT parasite. We provide large scale and zoomed-in images of free kinetochores in Fig. S8.

      - Fig S4: Insets are not mentioned in the figure legend. Please add scale bar to zoom-ins

      We now describe the insets in the figure legends and have added scale bars to the zoomed-in images.

      - Fig S5A, B: Please indicate which inset belongs to which sub-panel. Where does Ac stem from?

      We have now included the full image showing the inset (new Fig. S8).

      - Fig S5C and S8C: Change "DNA" to "Nucleus".

      We have changed “DNA” to “Nucleus”. Now they are Fig. S8K and S11I.

      Reviewer #3 (Significance):

      Yet, the statement that SUN1 is also important for lipid homoeostasis and NE dynamics is currently not backed up by sufficient data. I believe that the manuscript would benefit from removing the less convincing transcriptomic and lipidomic datasets and rather focus on more deeply characterising the cell biology of the knockouts. This way, the results would be interesting not only for parasitologists, but also for more general cell biologists.

      We have moved the lipidomics and transcriptomics data to supplementary information and toned down the emphasis on these data to make the manuscript more focused on the cell biology and analysis of the genetic KO data.

    1. eLife Assessment

      In this valuable study, the authors used rats to determine the receptor for a food-related perception that has been characterized in humans. The data are solid in terms of methods and analysis: the data show that this stimulus (ornithine) has some additive effects in terms of increasing preference and taste response in rats when it is mixed with other more common taste stimuli. Therefore, the combinations of experiments generally support (but do not conclusively prove) the hypothesis that the "kokumi" taste effect elicited by this stimulus in humans may be mediated by the specific receptor examined in the study.

    2. Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revisions they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine. This (supplementary) experiment was conducted with a small sample size, and though perhaps useful, these preliminary results do not align particularly well with the animal experiments. It would be helpful to further explore human taste of ornithine in a larger and better-controlled study.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats, and that the enhancement is mediated, at least in part, within fungiform taste buds. This finding could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really make sense. If the human studies are so important, as the authors state, then why did the authors relegate them to the supplemental data section? Usually one places background and negative findings in this section of a paper. Accordingly, I recommend that the human data be published in a separate article.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      In the Discussion, the authors consider the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate, arguing that extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses. This concept may help to design future experiments.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is it being descriptive, non-mechanistic, and requires experimental validation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumor microenvironment. Given notable weaknesses like lack of validation cohort or validation using other platforms (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      The study reveals significant findings by identifying a transcriptional component (TC121) associated with synaptic signaling, which is linked to shorter survival in patients with high-grade serous ovarian cancer, highlighting the potential role of neurons in the tumor microenvironment. However, the evidence could be strengthened by experimental validation to confirm the functional roles of key genes within TC121 and further exploration of its spatial aspects, including deeper analysis of neuronal and synaptic and other neuronal gene expression.

      Strengths:

      Innovative Methodology:<br /> The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      Comprehensive Data Integration:<br /> The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      Clinical Relevance:<br /> The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumor microenvironment in cancer progression.

      Weaknesses:

      Mechanistic Insights:<br /> While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      Generalizability:<br /> The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      Innovative Methodology:<br /> Requires more validation using different platforms (IHC) to validate the performance of this bulk derived data. Also, the lack of control on data quality is a concern.

      Clinical Application:<br /> Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.

    3. Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data which are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns which are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC would lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (for example, Knapen et al. (2024) Commun. Med).

      Strengths:

      Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.<br /> The resulting annotated transcriptional components have been made available in a searchable online format.

      For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers is compelling and supports the need for further mechanistic follow-up.

      Further comments:

      This revised version includes a suite of comparisons between the c-ICA-derived components and existing published transcriptomic/genomic-based classifications of ovarian cancers. Newly described components will require experimental validation, as acknowledged by the authors.

      Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.<br /> In this revised version, the authors additionally investigate their TC scores in single cells from a published HGSOC single-cell RNAseq dataset, highlighting examples of TC scores within and between cell types.

      c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods which explicitly use a prior cell signature matrix.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.”

      We sincerely thank the editors for their valuable and constructive feedback. We are grateful for the recognition of our findings and the importance of identifying transcriptional components in high-grade serous ovarian cancers.

      We acknowledge the editors’ observation regarding the descriptive nature of our study and its limited mechanistic depth. We agree that additional experimental validation would further strengthen our conclusions. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study. In addition, recent reviews focused on the emerging field of cancer neuroscience emphasize the early stages the field is in, specifically in terms of a mechanistic understanding of the contributions of tumor-infiltrating nerves in tumor initiation and progression (Amit et al., 2024; Hwang et al., 2024). Nonetheless, we wish to emphasize that emerging mechanistic preclinical studies have demonstrated the influence of tumour-infiltrating nerves on disease progression (Allen et al., 2018; Balood et al., 2022; Darragh et al., 2024; Globig et al., 2023; Jin et al., 2022; Restaino et al., 2023; Zahalka et al., 2017). Several of these studies include contributions from our co-authors and feature in vitro and in vivo research on head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma samples. This study further strengthens the preclinical work by showing in patient data, the potential relevance of neuronal signaling on disease outcome.

      For instance, Restiano et al. (2023) demonstrated that substance P, released from tumour-infiltrating nociceptors, potentiates MAP kinase signaling in cancer cells, thereby driving disease progression. Crucially, this effect was shown to be reversible in vivo by blocking the substance P receptor (Restaino et al., 2023). These findings offer compelling evidence of the role of tumour innervation in cancer biology.

      Our current study in tumor samples of patients with high-grade serous ovarian cancer identifies a transcriptional component that is enriched for genes for which the protein is located in the synapse. We believe that the previously published mechanistic insights support our findings and suggest that this transcriptional component could serve as a valuable screening tool to identify innervated tumours based on bulk transcriptomes. Clinically, this information is highly relevant, as patients with innervated tumours may benefit from alternate therapeutic strategies targeting these innervations.

      Reviewer #1 (Public review)

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumour microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on using consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      We thank the reviewer for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We consider this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumour microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumour microenvironment in cancer progression.

      We appreciate the recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumour microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We acknowledge the point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would significantly enhance our understanding of how the biological processes captured by these transcriptional components influence cancer progression. We are planning and executing the experiments for  a future study to provide mechanistic insights into the associations found in this study.

      Our analyses were performed on publicly available bulk and spatial resolved expression profiles. To investigate the mechanistic insights in future studies, we plan to integrate spatial transcriptomic data with immunohistochemical analysis of the same tumour samples to validate our findings. Additionally, we have initiated efforts to set up in vitro co-cultures of neurons and ovarian cancer cells. These co-cultures will enable us to investigate how synaptic signaling impacts ovarian cancer cell behavior.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      To respond to this remark, we utilized survival data from Bolton et al. (2022) and TCGA to investigate associations between TC activity scores and overall survival of patients with ovarian clear cell carcinoma, the second most common subtype of epithelial ovarian cancer, and  other cancer types respectively. However, we acknowledge the limitations of TCGA survival data, as highlighted in the referenced article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/). Additionally, as shown in Figure 5, we provided evidence of TC121 activity across various cancer types, suggesting broader relevance. For the results of the analyses mentioned above, please refer to our response to remark 1.3 of the recommendation section (page 4).

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the value of validating our results with alternative platforms such as IHC. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study.

      We implemented regarding data quality control, the following measures to ensure the reliability of our analysis:

      Bulk Transcriptional Profiles: To assess data quality, we conducted principal component analysis (PCA) on the sample Pearson product-moment correlation matrix. The first principal component (PCqc), which explains approximately 80-90% of the variance, was used to distinguish technical variability from biological signals (Bhattacharya et al., 2020). Samples with a correlation coefficient below 0.8 relative to PCqc were identified as outliers and excluded. Additionally, MD5 hash values were generated for each CEL file to identify and remove duplicate samples. Expression values were standardized to a mean of zero and a variance of one for each gene to minimize probeset- or gene-specific variability across datasets (GEO, CCLE, GDSC, and TCGA).

      Spatial Transcriptional Profiles: PCA was also applied to spatial transcriptomic data for quality control. Only samples with consistent loading factor signs for the first principal component across all individual spot profiles were retained. Samples failing this criterion were excluded from further analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.”

      Regarding clinical applications, we acknowledge the importance of further exploring strategies targeting synaptic signaling and neurotransmitter release in the tumour microenvironment (TME). As partially discussed in the first version of the manuscript, drugs such as ifenprodil and lamotrigine—commonly used to treat neuronal disorders—can block glutamate release, thereby inhibiting subsequent synaptic signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine blocks the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines demonstrated that ifenprodil significantly reduced cancer cell proliferation, while reserpine triggered apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). The findings highlight the potential of such approaches to disrupt synaptic neurotransmission in the TME.

      To address potential translation of our findings into clinical practice more comprehensively, we have included additional details in the manuscript:

      Section discussion, page 16, lines 338-341:

      “This interaction can be targeted with pan-TRK inhibitors such as entrectinib and larotrectinib. Both drugs are showing promising results in multiple phase II trials, including ovarian cancer and breast cancer patients. Furthermore, a TRKB-specific inhibitor was developed (ANA-12), but has not been subjected to any clinical trials in cancer so far (Ardini et al., 2016; Burris et al., 2015; Drilon et al., 2018, 2017).”

      On page 17, lines 361-374:

      “Strategies to disrupt neuronal signaling and neurotransmitter release in neurons target key elements of excitatory neurotransmission, such as calcium flux and vesicle formation. Drugs like ifenprodil and lamotrigine, commonly used to treat neuronal disorders, block glutamate release and subsequent neuronal signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine prevents synaptic vesicle formation (Reid et al., 2013; Williams, 2001). In vitro studies with HGSOC cell lines have demonstrated that ifenprodil significantly inhibits tumour proliferation, while reserpine induces apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). These approaches hold promise for inhibiting neuronal signaling and interactions in the TME.”

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might have been overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found to be convincing. These analyses were integral to enhancing our findings’ robustness and biological relevance.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for this important positive remark.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We acknowledge the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we incorporated analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we evaluated correlations between gene signatures from a subset of these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach. Please refer to our response to remark 10 for the results of these analyses.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.”

      We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumour and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we used single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and projected our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects. Please refer to our response to remark 2.2 in the recommendations to the authors (page 14) for the results of this analysis.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78 (12):3233-3242.

      Ardini E, Menichincheri M, Banfi P, Bosotti R, Ponti CD, Pulci R, Ballinari D, Ciomei M, Texido G, Degrassi A, Avanzi N, Amboldi N, Saccardo MB, Casero D, Orsini P, Bandiera T, Mologni L, Anderson D, Wei G, Harris J, Vernier J-M, Li G, Felder E, Donati D, Isacchi A, Pesenti E, Magnaghi P, Galvani A. 2016. Entrectinib, a Pan–TRK, ROS1, and ALK Inhibitor with activity in multiple molecularly defined cancer Indications. Mol Cancer Ther 15:628–639.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Burris HA, Shaw AT, Bauer TM, Farago AF, Doebele RC, Smith S, Nanda N, Cruickshank S, Low JA, Brose MS. 2015. Abstract 4529: Pharmacokinetics (PK) of LOXO-101 during the first-in-human Phase I study in patients with advanced solid tumors: Interim update. Cancer Res 75:4529–4529.

    1. eLife Assessment

      Xenacoelomorpha is an enigmatic phylum, displaying various presumably simple or ancestral bilaterian features. This valuable study characterises the reproductive life history of Hofstenia miamia, a member of class Acoela in this phylum. The authors describe the morphology and development of the reproductive system, its changes upon degrowth and regeneration, and the animals' egg-laying behaviour. The evidence is convincing, with fluorescent microscopy and quantitative measurements as a considerable improvement to historical reports based mostly on histology and qualitative observations.

    2. Reviewer #1 (Public review):

      The aim of this study was a better understanding of the reproductive life history of acoels. The acoel Hofstenia miamia, an emerging model organism, is investigated; the authors nevertheless acknowledge and address the high variability in reproductive morphology and strategies within Acoela.

      The morphology of male and female reproductive organs in these hermaphroditic worms is characterised through stereo microscopy, immunohistochemistry, histology, and fluorescent in situ hybridization. The findings confirm and better detail historical descriptions. A novelty in the field is the in situ hybridization experiments, which link already published single-cell sequencing data to the worms' morphology. An interesting finding, though not further discussed by the authors, is that the known germline markers cgnl1-2 and Piwi-1 are only localized in the ovaries and not in the testes.

      The work also clarifies the timing and order of appearance of reproductive organs during development and regeneration, as well as the changes upon de-growth. It shows an association of reproductive organ growth to whole body size, which will be surely taken into account and further explored in future acoel studies. This is also the first instance of non-anecdotal degrowth upon starvation in H. miamia (and to my knowledge in acoels, except recorded weight upon starvation in Convolutriloba retrogemma [1]).

      Egg laying through the mouth is described in H. miamia for the first time as well as the worms' behavior in egg laying, i.e. choosing the tanks' walls rather than its floor, laying eggs in clutches, and delaying egg-laying during food deprivation. Self-fertilization is also reported for the first time.

      The main strength of this study is that it expands previous knowledge on the reproductive life history traits in H. miamia and it lays the foundation for future studies on how these traits are affected by various factors, as well as for comparative studies within acoels. As highlighted above, many phenomena are addressed in a rigorous and/or quantitative way for the first time. This can be considered the start of a novel approach to reproductive studies in acoels, as the authors suggest in the conclusion. It can be also interpreted as a testimony of how an established model system can benefit the study of an understudied animal group.

      The main weakness of the work is the lack of convincing explanations on the dynamics of self-fertilization, sperm storage, and movement of oocytes from the ovaries to the central cavity and subsequently to the pharynx. These questions are also raised by the authors themselves in the discussion. Another weakness (or rather missing potential strength) is the limited focus on genes. Given the presence of the single-cell sequencing atlas and established methods for in situ hybridization and even transgenesis in H. miamia, this model provides a unique opportunity to investigate germline genes in acoels and their role in development, regeneration, and degrowth. It should also be noted that employing Transmission Electron Microscopy would have enabled a more detailed comparison with other acoels, since ultrastructural studies of reproductive organs have been published for other species (cfr e.g. [2],[3],[4]). This is especially true for a better understanding of the relation between sperm axoneme and flagellum (mentioned in the Results section), as well as of sexual conflict (mentioned in the Discussion).

      (1) Shannon, Thomas. 2007. 'Photosmoregulation: Evidence of Host Behavioral Photoregulation of an Algal Endosymbiont by the Acoel Convolutriloba Retrogemma as a Means of Non-Metabolic Osmoregulation'. Athens, Georgia: University of Georgia [Dissertation].<br /> (2) Zabotin, Ya. I., and A. I. Golubev. 2014. 'Ultrastructure of Oocytes and Female Copulatory Organs of Acoela'. Biology Bulletin 41 (9): 722-35.<br /> (3) Achatz, Johannes Georg, Matthew Hooge, Andreas Wallberg, Ulf Jondelius, and Seth Tyler. 2010. 'Systematic Revision of Acoels with 9+0 Sperm Ultrastructure (Convolutida) and the Influence of Sexual Conflict on Morphology'.<br /> (4) Petrov, Anatoly, Matthew Hooge, and Seth Tyler. 2006. 'Comparative Morphology of the Bursal Nozzles in Acoels (Acoela, Acoelomorpha)'. Journal of Morphology 267 (5): 634-48.

    3. Reviewer #2 (Public review):

      Summary:

      While the phylogenetic position of Acoels (and Xenacoelomorpha) remains still debated, investigations of various representative species are critical to understanding their overall biology.

      Hofstenia is an Acoels species that can be maintained in laboratory conditions and for which several critical techniques are available. The current manuscript provides a comprehensive and widely descriptive investigation of the productive system of Hofstenia miamia.

      Strengths:

      (1) Xenacoelomorpha is a wide group of animals comprising three major clades and several hundred species, yet they are widely understudied. A comprehensive state-of-the-art analysis on the reprodutive system of Hofstenia as representative is thus highly relevant.

      (2) The investigations are overall very thorough, well documented, and nicely visualised in an array of figures. In some way, I particularly enjoyed seeing data displayed in a visually appealing quantitative or semi-quantitative fashion.

      (3) The data provided is diverse and rich. For instance, the behavioral investigations open up new avenues for further in-depth projects.

      Weaknesses:

      While the analyses are extensive, they appear in some way a little uni-dimensional. For instance the two markers used were characterized in a recent scRNAseq data-set of the Srivastava lab. One might have expected slightly deeper molecular analyses. Along the same line, particularly the modes of spermatogenesis or oogenesis have not been further analysed, nor the proposed mode of sperm-storage.

    4. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. eLife Assessment

      The study presents valuable findings on the role of Aff3ir, a gene implicated in flow-induced atherosclerosis and regulating the inflammation-associated transcription factor, IRF5. The in vivo data are solid in providing evidence on the role of Aff3ir in shear stress and formation of atheromatous plaques. The work will be of interest to clinical researchers and biologists focusing on inflammation and atherosclerosis in cardiovascular disease with a broad eLife readership.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5 mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) Adequate number of replicates were used for this study.<br /> (2) Both in vitro and in vivo validation was done.<br /> (3) Figures are well presented<br /> (4) In vivo causality is checked with cleverly designed experiments

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.<br /> (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent neighborhood? If they do how does this correlation affect the conclusion of the paper?<br /> (3) Volcano plot does not indicate q value of the shown genes. It is advisable to calculate q value for each of the genes which represents the FDR probability of the identified genes.<br /> (4) GO enrichment was done against Global gene set or local geneset? Authors should provide more detailed information about the analysis.<br /> (5) If the analysis was performed against global gene set. How does that connect with this specific atherosclerotic microenvironment?<br /> (6) what was the basal expression of genes and how does the DGE (differential gene expression) values differ?<br /> (7) How did IRF5 picked from GO analysis? was it within 20 most significant genes?<br /> (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar like AFF3 in fig 1.

      Comments on Revision:

      The authors have adequately addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understand of atherosclerosis development.

      Strengths:

      The in vivo data is convincing using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE-/- mice.

      Weaknesses:

      The effect on atherosclerosis is clear and there is sufficient evidence to conclude that this is the result of reduced endothelial cell activation. However, other cell types such as smooth muscle cells or macrophages could be contributing to the effects observed. The mouse model is a global knockout and the shRNA knockdowns (Fig. 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature.

      The in vitro experiments are solidly executed, but most experiments are performed in mouse embryonic fibroblasts (MEFs) and results extrapolated to endothelial cell responses. However, several key experiments are repeated in HUVEC, thereby making a solid case that Aff3ir-ORF2 can regulate IRF5 in both MEFs and HUVEC. It is important to note that the sequence of AFF3ir-ORF2 is not conserved in humans and lacks an initiation codon, hence the regulatory pathway is not conserved. However, the overexpression studies in HUVEC suggest that mouse AFF3ir-ORF2 can also regulate human IRF5 and hence the mechanism retains relevance for possible human health interventions.

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis. The study shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but makes a solid case that this mechanism is relevant for atherosclerosis development via endothelial cell activation.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow-induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5-mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, the authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) An adequate number of replicates were used for this study.

      (2) Both in vitro and in vivo validation was done.

      (3) The figures are well presented.

      (4) In vivo causality is checked with cleverly designed experiments.

      We thank you for your positive remarks.

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.

      Thanks. We have followed your advice and performed ELISA experiments to measure the concentrations of inflammatory cytokines, including IL-6 and IL-1β. The newly acquired results have been included in Figure 2E (Line 160-163) in the revised manuscript.

      (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent the neighborhood?

      If they do how does this correlation affect the conclusion of the paper?

      We thank the reviewer for this professional comments and apologize for the confusion. The heatmap using Euclidean distance was generated based on the expression levels of all differentially expressed genes (calculated with deseq2). Since its interpretation overlaps with the volcano plot presented in Figure 4B, we have moved the heatmap to Figure S5A in the revised manuscript and provided a detailed description in the figure legend (Lines 106-108 in the supporting information). Additionally, to better illustrate the variation among all samples, we have performed PCA analysis and included the new results in Figure 4A of the revised manuscript.

      (3) The volcano plot does not indicate the q value of the shown genes. It is advisable to calculate the q value for each of the genes which represents the FDR probability of the identified genes.

      Thank you for your careful review. We apologize for the incorrect labeling.

      It was P.adj value. The label for Figure 4B has been corrected in the revised manuscript. 

      (4) GO enrichment was done against the Global gene set or a local geneset? The authors should provide more detailed information about the analysis.

      Thank you. We performed GO enrichment analysis against the global gene set. The description of the results has been updated in the revised manuscript (Lines 222–224).

      (5) If the analysis was performed against a global gene set. How does that connect with this specific atherosclerotic microenvironment?

      Thank you for your insightful comments. We have followed your advice and investigated the functional characteristics of these differentially expressed genes in the context of the atherosclerotic microenvironment. The RNA-seq differential gene list was further mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), resulting in 363 overlapping genes. The 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis of these genes revealed enrichment in processes related to cell−cell adhesion and leukocyte activation involved in immune response (Figure S5B), which is highly consistent with the observed effects of AFF3ir-ORF2 on VCAM-1 expression. The newly acquired data are presented in Figure S5B and the description of the results is included in the revised manuscript (Line 227-233).

      (6) What was the basal expression of genes and how did the DGE (differential gene expression) values differ?

      Thanks for the comments. The RNA-sequencing data has been submitted to GEO datasets (GSE286206), making the basal gene expression data available to readers.

      The differential expression analysis was performed using DESeq2 (v1.4.5) (PMID: 25516281) with a criterion of 1.5-fold change and P<0.05. We has included the description in the revised manuscript in Lines 220-222 and Lines 575-576.

      (7) How was IRF5 picked from GO analysis? was it within the 20 most significant genes?

      Sorry for the confusion. IRF5 was not identified through GO analysis. To determine the upstream transcriptional regulators, we used the ChEA3 database to predict potential upstream transcription factors based on all differentially expressed genes. The top 20 transcription factors were selected based on their scores. To further explore their relationship with atherosclerosis, these top 20 transcription factors were mapped to the atherosclerosis-related gene list in the DisGeNET database. IRF5 and IRF8 were the only two overlapping genes. To clarify this process, we have included a more detailed description of the IRF prediction approach in the revised manuscript (Lines 234–239).

      (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar to AFF3 in Figure 1.

      We thank the reviewer for the valuable suggestion. We have followed your advice and provided the more representative images in Figure 1F.

      Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understanding of atherosclerosis development.

      Strengths:

      The in vivo data is solid using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies, and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE<sup>-/-</sup> mice.

      We thank you for your positive remarks.

      Weaknesses:

      While the in vivo data is generally convincing, a few data panels have issues and will need addressing. Also, the knockout mouse model will need to be described, since the paper referred to in the manuscript does not actually report any knockout mouse model. Hence it is unclear how Aff3ir-ORF2 is targeted, but Figure S2B shows that targeting is partial, since about 30% expression remains at the RNA level in MEFs isolated from the knockout mice.

      We thank you for the valuable comments. 

      First, we have followed your advice and included detailed information regarding the animal construction in the revised manuscript in Line 405-415. Additionally, the genotyping results have been included in new Figure S3A.

      Second, we acknowledge your concern about the knockout efficiency of ORF2 in mice. While the PCR assay indicated approximately 30% residual expression, our Western blot analysis of aorta samples demonstrated that ORF2 protein was barely detectable in knockout mice, as shown in new Figure S3B-C. Besides, our in vivo experiments using MEF from WT and AFF3ir-ORF2<sup>-/-</sup> mice (Figure 4I) further confirmed successful knockout. 

      Third, we have included a discussion addressing the discrepancies between PCR and Western blot results. In addition to technical differences between the two methods, the nature of AFF3ir-ORF2 may also contribute to these inconsistencies. The parent gene AFF3 is located in a genetically variable region and can be excised via intron 5 to form a replicable transposon, which translocates to other chromosomes and has been linked to leukemia (PMID: 34995897, 12203795, 12743608, and 17968322). AFF3ir is located in the intron 6, thus it exists in the transposon, which may complicate the measurement of its expression. Replicable transposons can exist as extrachromosomal elements, allowing them to be inherited across generations. We have included these discussion in the revised manuscript in Line 188-196.

      While the effect on atherosclerosis is clear, the conclusion that this is the result of reduced endothelial cell activation is not supported by the data. The mouse model is described as a global knockout and the shRNA knockdowns (Figure 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature. Therefore, other cell types such as smooth muscle cells or macrophages could be responsible for the effects observed.

      Thank you for your critical comment. To address your concern, we have made the following three revisions:

      First, we have analyzed the expression of AFF3ir-ORF2 in the vascular wall with or without intima in WT and AFF3ir-ORF2 knockout mice. As shown in Figure 1B and Figure S1A, while the expression of AFF3ir-ORF2 was notably downregulated in the aortic intima of athero-prone regions compared to the protective region, it remained largely unchanged in the aortic wall without intima across different regions of the aorta. This suggested that AFF3ir-ORF2 might play a predominant role in endothelial cells rather than other cell types in the context of shear stress.

      Second, we have used human endothelial cells (HUVECs) to further confirm our findings. As shown in Figure 2C and Figure S2B, we found that AFF3ir-ORF2 overexpression could attenuate disturbed shear stress-induced IRF5 nuclear translocation and the expression of inflammatory genes in HUVECs, suggesting the potential anti-inflammatory effects of AFF3ir-ORF2 in endothelial cells.

      Third, we agree with the reviewer’s comment that we cannot completely exclude the potential involvement of other cell types. Hence, we have included a limitation statement in the discussion part in Lines 341-344.

      The weakest part of the manuscript is the in vitro experiment using some nonidentifiable expression differences. The data is used to hypothesise on a role for IRF5 in the effects observed with Aff3ir-ORF2 knockout.

      Thank you for the comments. To address your concerns, we have made the following two changes:

      First, we have further investigated the functional features of the differential genes from the RNA-seq in the context of atherosclerotic microenvironment. The differential gene list was mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), and a total of 363 genes overlapped. These 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis showed that these genes were mainly enriched in cell−cell adhesion and leukocyte activation involved in immune response, which aligns with the expression of VCAM-1 affected by AFF3ir-ORF2. The newly acquired data are presented in Figure S5B and the description of the results has been updated in the revised manuscript (Line 227-233).

      Second, we have further verified the RNA-seq results in vitro. Several classical inflammatory factors, including ICAM-1, CCL5, and CXCL10, which mRNA levels were significantly downregulated in RNA-seq and were also identified as target genes of IRF5, were analyzed. We found that AFF3ir-ORF2 deficiency aggravated, while AFF3ir-ORF2 overexpression attenuated, the expression of ICAM-1, CCL5, and CXCL10 induced by disturbed shear stress (New Figure S5D). Besides, the regulation of ICAM-1 by AFF3ir-ORF2 was confirmed at both protein and mRNA levels in HUVECs (Figure 2C-D and Figure S2B). 

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis, but the cell types involved and mechanisms remain unclear. The study also shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but any relevance of this mechanism for atherosclerosis or any cell types involved in the development of this disease remains largely speculative.

      Thank you for all the valuable comments. The specific responses have been provided above. Briefly, we have followed your advice and further confirmed the regulation of AFF3ir-ORF2 on IRF5 in endothelial cells. Besides, the RNA-seq results have been further analyzed, and partial results have been verified in endothelial cells to support the anti-inflammatory role of AFF3ir-ORF2. We greatly appreciate the reviewer’s insightful comments, which guided our revisions and contributed to significantly improving the paper.

      Reviewer #3 (Public review):

      This study is to demonstrate the role of Aff3ir-ORF2 in the atheroprone flow-induced EC dysfunction and ensuing atherosclerosis in mouse models. Overall, the data quality and comprehensiveness are convincing. In silico, in vitro, and in vivo experiments and several atherosclerosis were well executed. To strengthen further, the authors can address human EC relevance.

      We thank you for your positive remarks and insightful comments.

      Major comments:

      (1) The tissue source in Figures 1A and 1B should be clarified, the whole aortic segments or intima? If aortic segment was used, the authors should repeat the experiments using intima, due to the focus of the current study on the endothelium.

      We thank you for the suggestion. The tissue used in Figures 1A and 1B was from aortic intima. The description has been updated for clarity in the revised manuscript on Lines 114-125. 

      (2) Why were MEFs used exclusively in the in vitro experiments? Can the authors repeat some of the critical experiments in mouse or human ECs?

      Thank you for this insightful comment. Isolation and culture of mouse primary aortic ECs were notorious technically difficult and shear stress experiment require a large number of cells. Considering MEFs exhibit responses consistent with those of ECs, which has been delicately proved (PMID: 23754392), we used MEFs in our in vitro experiments.

      However, following your valuable advice, we have now employed human ECs (HUVECs) to confirm our findings. Consistent with our results in MEFs, we found that AFF3ir-ORF2 overexpression reduced the expression of inflammatory genes induced by disturbed shear stress at both protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). Notably, despite the significant anti-inflammatory effects of AFF3irORF2, the sequence of this gene is not conserved in Homo sapiens and lacks an initiation codon, which is why we did not further proceed with the loss-of-function experiments.

      (3) The authors should explain why AFF3ir-ORF2 overexpression did not affect the basal level expression of ICAM-1, VCAM-1, IL-1b, and IL-6 under ST conditions (Figure 2A-C).

      We thank you for raising this critical question. Indeed, we found that AFF3ir-ORF2 overexpression did not affect the basal level of inflammatory genes under ST conditions, while it exerted anti-inflammatory effects under OSS conditions. One underlying reason might be the relative low level of expression of inflammatory genes under ST compared to OSS conditions. Additionally, as our findings suggested, AFF3ir-ORF2 exerted its anti-inflammatory role by binding to IRF5 and inhibiting IRF5 nuclear translocation. However, as shown in Figure 4I, IRF5 might be predominantly localized in the cytoplasm rather than the nucleus under ST conditions.

      We have included the description in the revised manuscript on Lines 157-163.

      (4) Please include data from sham controls, i.e., right carotid artery in Figure 2E.

      Thank you for the suggestion. We have followed your advice and included sham controls (staining of the right carotid arteries) in Figure S2E.

      (5) Given that the merit of the study lies in the effect of different flow patterns, the legion areas in AA and TA (Figure 3B, 3C) should be separately compared.

      We have followed your valuable suggestion and included the additional statistical results in Figure 3C in the revised manuscript.

      (6) For confirmatory purposes for the variations of IRF5 and IRF8, can the authors mine available RNA-seq or even scRNA-seq data on human or mouse atherosclerosis? This approach is important and could complement the current results that are lacking EC data.

      Thank you for your valuable suggestion. In the present study, we found that disturbed flow did not alter the protein level of IRF5 but promoted its nuclear translocation. Following your advice, we analyzed the expression of IRF5 in human ECs (GSE276195) and atherosclerotic mouse arteries (GSE222583) using public databases. Consistently, IRF5 did not show significant changes in mRNA levels under these conditions (Figure S5E-F), suggesting that the regulation of IRF5 in the context of disturbed flow or atherosclerosis is primarily post-translational.

      (7) With the efficacy of using AAV-ICAM2-AFF3ir-ORF2 in atherosclerosis reduction (Figure 6), the authors are encouraged to use lung ECs isolated from the AFF3ir-ORF2/-mice to recapitulate its regulation of IRF5.

      We greatly appreciate your valuable suggestion to use lung ECs from mice. We have observed that AFF3ir-ORF2 deficiency enhanced the nuclear translocation of IRF5 induced by OSS. Noteworthy, the transcriptional levels of IRF5 were minimally affected by AFF3ir-ORF2 deficiency. Hence, to recapitulate the regulation of IRF5 with lung ECs isolated from the AFF3ir-ORF2<sup>-/-</sup> mice, it would require treating lung ECs with OSS followed by isolation of subcellular components. However, both in vitro shear stress treatment and subcellular fraction isolation require a large number of cells, and mouse lung ECs are difficult to culture and pass through several passages. Therefore, we hope the reviewer understands that these experiments were not performed. As an alternative, we have confirmed the transcriptional activity changes of IRF5 due to AFF3ir-ORF2 manipulation by analyzing the expression of its target genes indicated from RNA-seq results in both the intima of mouse aorta (Figure S5C-D) and HUVECs (Figure 2C-D and Figure S2B). Our findings show that AFF3ir-ORF2 deficiency increases, while its overexpression decreases, the expression levels of IRF5-targeted genes in endothelial cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2H - As I understand it, this is MFI measurement of VCAM. Please change accordingly.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      My major concern is the use of MEFs for all in vitro experiments. All experiments should be done in endothelial cells if the aim is to show a mechanism relevant to endothelial activation and atherosclerosis. Lines 314-316 of the conclusion are absolutely not supported by the data.

      Thank you for the insightful comment. Following your advice, we have employed human ECs (HUVECs) to confirm our findings. Consistent with the findings in MEFs, we found that AFF3ir-ORF2 decreased the expression of inflammatory genes induced by disturbed shear stress, both at protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). 

      Since the in vivo experiments are not cell type-specific, it would be important to test and compare the expression of Aff3ir-ORF2 in endothelial cells as well as smooth muscle and macrophages to support any claim of cell type involvement in the effects observed.

      We thank you for the valuable suggestion. In the revised manuscript, we have followed your suggestion and analyzed the expression pattern of AFF3ir-ORF2 in different regions of the aorta with or without endothelium. We observed a marked reduction in AFF3ir-ORF2 expression in the intima of the aortic arch compared to that in the intima of the thoracic aorta (Figure 1B-C). In contrast, the expression of AFF3irORF2 in the media and adventitia was comparable between the aortic arch and thoracic aorta (Figure S1A-B). These findings provide further evidence supporting the predominant role of endothelial cells. The description has been modified accordingly in the revised manuscript on Lines 121-134.

      The results of the RNA-seq experiment should be disclosed. The experiment should be deposited on GEO or similar and a table of differentially expressed genes added to the manuscript.

      Thank you for the suggestion. We have followed your advice and submitted the RNA-sequencing data to GEO datasets (GSE286206). Besides, a table of differentially expressed genes has been included in the revised manuscript as Table S3.

      Minor comments:

      (1) Figure 1A. Missing the labels of the target.

      Thanks. Corrected. 

      (2) Figure 1D. Cell alignment in AA compared to TA suggests that the image is of the outer curvature, but Figure 1F is showing that the outer curvature is expressing more ORF2 than the inner. Why was the outer curvature chosen for this panel and is it true to conclude on that assumption that expression of ORF2 compares as TA > Outer > Inner curvature?

      We thank you for the insightful suggestion. We have followed your advice and performed en-face immunofluorescence staining of AFF3ir-ORF2 and quantification of AFF3ir-ORF2 expression in AA inner, AA outer, and TA regions. As shown in new Figure 1D-E, the results indeed indicated that expression of AFF3irORF2 compares as TA > AA outer > AA inner.

      (3) Figure 2H. Target mislabelled as ICAM-1 instead of VCAM-.

      Thanks. Corrected. 

      (4) Figure S1A. VE-cad staining and cell shape differ between control and overexpression. Is this a phenotype or are different areas of the vasculature shown, which would make it hard to interpret since Aff3ir-ORF2 levels differ in different vessel areas?

      We thank the reviewer for raising this important question. For Figure S1A, only common carotid arteries were used for the staining. The potential differences in cell shape observed might be due to variations in the procedure during immunofluorescence staining. To avoid any misinterpretation, more representative images have been provided in the revised Figure S2C.

      (5) Figure 3D-G. Images are not representative of the quantification results.

      Thank you. More representative images have been replaced in the revised Figure 3D and Figure 3F.

      (6) Line 220. Data for IRF8 are not shown in the figure to support this claim.

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C.

      (7) Figure 6F. AAV-AFF3ir-ORF2 panel order inverted.

      Thanks. Corrected. 

      (8) Line 401. Type "hat" instead of "h at".

      Sorry for the typo. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1)  The rationale for the following sentence (lines 126-128) is lacking: "Moreover, 126 we observed the expression of AFF3ir-ORF2 in longitudinal sections of the mouse aorta (B. 127 Li et al., 2019)".

      Thanks. The rationale for these experiments have been included in the revised manuscript on Line 127-129. 

      (2) The source of antibodies against AFF3ir-ORF1 and AFF3ir-ORF2 used in western blot and immunostaining experiments were not mentioned in the manuscript.

      Thanks. The antibody information has been included in the method part on Line 456-457, 510-511. 

      (3) The rationale and data interpretation is not clear for the following sentence (lines 220-221): "In addition, neither IRF5 nor IRF8 expression was regulated by AFF3irORF2 220 (Figure 4F)".

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C. The sentence has been modified accordingly on Lines 253254. 

      (4) The quality of AFF3ir-ORF2 blot in Figure 4I needs improvement.

      Thanks. More representative images have been included in Figure 4I.

      (5) It appears that AFF3ir-ORF2 was present in both cytoplasm and nucleus. Does AFF3ir-ORF2 have a nuclear entry peptide? Also, the nuclear entry of AFF3ir-ORF2 can be enhanced by an immunofluorescence staining experiment.

      Thank you for your insightful comments. Indeed, although we did not observe any significant subcellular changes in the localization of AFF3ir-ORF2 under shear stress conditions, our immunostaining results revealed that AFF3ir-ORF2 is localized in both the cytoplasm and nucleus. To explore whether AFF3ir-ORF2 contains nuclear localization signals, we utilized the NLStradamus tool (http://www.moseslab.csb.utoronto.ca/NLStradamus/) to analyze its sequence. The predication indicated that AFF3ir-ORF2 lacks a nuclear localization signal.

    1. eLife Assessment

      This useful study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is solid, as it qualitatively replicates empirical behavioral data, but the experimental data is incomplete. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the study could be of use to researchers interested in computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      Summary:

      Sakagiannis et al. propose a hierarchically layer architecture to larval locomotion and foraging. They go from exploration to chemotaxis and odour preference test after associative learning.

      Strengths:

      A new locomotion model based on two oscillators that also incorporates peristaltic strides.

      Weaknesses:

      • The model is not always clearly or sufficiently explained (chemotaxis and odour test).

      • Data analysis of the model movement is not very thorough.

      • Comparisons with locomotion of behaving animals missing in chemotaxis and odour preference test after associative learning.

      • Overall it is hard to judge the descriptive and predictive value of the model.

    4. Reviewer #3 (Public review):

      Summary:

      This paper presents a framework for a multilevel agent-based model of the drosophila larva, using a simplified larval body and locomotor equations coupled to oscillators and sensory input. The model itself is built upon significant existing literature, particularly Wystrach, Lagogiannis, and Webb 2016 and Jürgensen et al. 2024. The aim is to generate an easily configurable, well-documented platform for organism-scale behavioral simulation in specific experiments. The authors demonstrate qualitative similarity between in vivo behavioral experiments to calibrated models.

      Strengths:

      The goal is excellent - a system to rapidly run computational experiments that align naturally with behavioral experiments would be well-suited to develop intuitions and cut through hypotheses. The authors provide quantitative descriptions that show that the best-fit parameters in their models produce results that agree with several properties of larval locomotion.

      The description of model calibration in the appendix is clear and explains several aspects of the model better than the main text.

      In addition, the code is well-organized using contemporary Python tooling and the documentation is nicely in progress (although it remains incomplete). However, see notes for difficulties with installation.

      Weaknesses:

      (1) As presented here the modeling itself is described in an unclear fashion and without a particular scientific question. The majority of the effort appears to be calibrating modest extensions of existing models and applying them to very simple experiments. This could be an effective first part of a paper on the software tool, but the paper needs to point to a scientific question or, if it is a tool paper, a gap in the current state of modeling tools needed to address scientific goals. While the manuscript has a good overview of larval behavioral papers, the discussion of modeling is more of an afterthought. However, the paper is a modeling paper and the contribution is to modeling and particularly with this work's minor adaptions of existing models, it is unclear what the principle contribution is intended to be.

      (2) While the models presented do qualitatively agree with experimental data in specific situations, there is no effort to challenge the model assumptions or compare them to alternative models. Simply because the data is consistent in a small number of simple experiments does not mean that the models are correct. Moreover, given the highly empirical nature of the modeling, I wonder what results are largely the model putting out what was put in, particularly with regards to kinematic results like frequency and body length or the effect of learning simply changing the sensory gain constant. It is difficult to imagine how at this level of empirical modeling, it would appear quite difficult to integrate the type of cell-type-specific perturbation or functional observation that is common in larval experiments.

      (3) The central framing of a "layered control architecture" does not have a significant impact on the work presented here and the paper would do better with less emphasis on it. Given the limited empirical models, there are only so many parameters where different components can influence one another, and as best as I can tell from the paper there is only chemotaxis and modulation of a chemotactic gain constant that are incorporated so far. However, since these are empirical functions it says little about how the layers are actually controlled by the nervous system - indeed, the larval nervous system appears to have many levels of local and long-range module of circuits at both the sensory and motor layers. It is not clear how this aspect would contribute beyond the well-appreciated concept of a relatively finite set of behavioral primitives in an insect brain, particularly for the fly larva. What would be a contradictory model and how would the authors differentiate between that and the one they currently propose? If focusing only on olfactory learning and chemotaxis, how does the current framing add to the existing understanding?

      (4) The paper uses experimental data to calibrate the models, however, the experiments are not described at all in the text.

    1. eLife Assessment

      Shihabeddin et al utilized single-cell RNA-Seq analysis of adult P23H zebrafish animals to identify transcription factors (e2fs, Prdm1a, Sp1) expressed selectively in neural progenitors and immature rods, and validated their necessity for regeneration using morphant analysis. The finding is useful, and the evidence is convincing. The deeper mechanistic analysis could further strengthen the current work by (1) distinguishing developmental vs regenerative transcriptional factors, (2) the addition of matched scATAC-Seq data, and (3) integration with single-cell multiome data from developing retina.

    2. Reviewer #1 (Public review):

      Summary:

      Shihabeddin et al. used bioinformatic and molecular biology tools to study the unique regeneration of rod photoreceptors in a zebrafish model. The authors identified a few transcription factors that seem to play an important role in this process.

      Strengths:

      This manuscript is well prepared. The topic of this study is an interesting and important one. Bioinformatics clues are interesting.

      Weaknesses:

      Considering the importance of the mechanism, the knockdown experiments require further validation. The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish). They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.

    3. Reviewer #2 (Public review):

      This is an interesting and important work from Shihabeddin et al, to identify master regulators for rod photoreceptor regenerations in a zebrafish model of Retinitis Pigmentosa. Building on their scRNA-seq data, Shihabeddin et al dissected the progenitor cell types and performed trajectory analyses to predict transcription factors that apparently drive the progenitor proliferation and differentiation into rod photoreceptors. Their analyses predicted e2f1, e2f2, and e2f3 as critical drivers of progenitor proliferation, Prdm1a as a driver of rod photoreceptor differentiation, and SP1 as a driver of rod photoreceptor maturation. Genetic experiments provide clear support for the roles of e2fs in progenitor proliferation. It's also apparent from Figure 8 that prdm1 knockdown appears to cause a decrease in rhodopsin expression. By colocalizing BrdU and Retp1, the authors inferred that the apparent "new rods" (which exhibit mixed BrdU and Retp1 signal) are decreased with prdm1, providing further support. Overall I found the work to be interesting, rigorous, and informative for the community.

      I have a few suggestions for the authors to consider:

      (1) Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.

      (2) The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.

      (3) It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses a combination of single-cell RNA-Seq to globally profile changes in gene expression in adult P23H transgenic zebrafish, which show progressive rod photoreceptor degeneration, along with age-matched controls. As expected, mitotically active retinal progenitors are identified in both conditions, the increased number of both progenitors and immature rods are observed. DrivAER-mediated gene regulatory network analysis in retinal progenitors, photoreceptor precursors, and mature rod photoreceptors respectively identified e2f1-3, prdm1a, and sp1 as top predicted transcriptional regulators of gene expression specific to these cell types. Finally, morpholino-mediated knockdown of these transcription factors led to expected defects in proliferation and rod differentiation.

      Strengths:

      Overall, this is a rigorous study that is convincingly executed and well-written. The data presented here will be a useful addition to existing single-cell RNA-Seq datasets obtained from regenerating zebrafish retina.

      Weaknesses:

      Multiple similar studies have been published and it is something of a missed opportunity in terms of identifying novel mechanisms of rod photoreceptor regeneration. Several other recent studies have used both single-cell RNA and ATAC-Seq to analyze gene regulatory networks that regulate neurogenesis in zebrafish retina following acute photoreceptor damage (Hoang, et al. 2020; Celloto, et al. 2023; Lyu, et al. 2023; Veen, et al 2023) or in other genetic models of progressive photoreceptor dystrophy such cep290 mutants (Fogerty, et al. 2022).

      The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, which would allow the use of more powerful tools such as Scenic+ (Bravo and de Winter, et al. 2023). It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023). The genes selected for functional analysis here are all either robustly expressed in retinal progenitor cells (ef1-3 and aurka) or in developing rods (prdm1a), so it is not really surprising that defects are observed. Identification of factors that selectively regulate rod photoreceptor regeneration, rather than those that regulate both development and regeneration, would provide additional novelty. This would also potentially allow the use of animal mutants for candidate genes, rather than exclusively relying on morphant analysis, which may have off-target effects.

      The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression. Most similar studies address this by analyzing multiple time points from specific ages or times post-injury.

    5. Author response:

      Reviewer 1: “The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish).”

      It is true that humans and other mammals are not capable of regeneration.  This is why we and many other groups study zebrafish to identify mechanisms of regeneration that successfully form new rods.  That said, our previous paper on the molecular basis or retinal remodeling in this zebrafish model system (Santhanam et al., 2023; Cell Mol Life Sci. 2023;80(12):362) revealed remarkable similarities in the stress and physiological responses of rods, cones, RPE and inner retinal neurons to those in mammalian RP models.  Thus, we believe this zebrafish is an adequate model of RP and an excellent model to study rod regeneration. 

      Reviewer 1: “They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.”  and:

      Reviewer 3: “It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023).”

      It is an excellent suggestion to compare the regenerative response we have studied in a chronic degeneration/regeneration model to the trajectory of developmental rod formation. In Lyu, et at. 2023, it was found that while retinal regeneration has similarities to retinal development, it does not precisely recapitulate the same transcription factors and processes. Any differences between this trajectory and that revealed in developmental studies would be enlightening.  We intend to do such analyses to add to a revised manuscript in the future. 

      Reviewer 2: “Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.”

      These are different experimental paradigms.  Figure 8 shows knockdown 48 hours after injection, at which time prdm1a knockdown is affecting rhodopsin expression directly.  That experiment investigated whether prdm1a knockdown affected progenitor proliferation.  Figure 9 shows a time point 6 days after injection, at which time we were asking if prdm1a knockdown affected differentiation of progenitors into rods. 

      Reviewer 2: “The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.”

      This was a very unexpected finding.  We included statistics (Figure 9D) to support the finding, so we don’t think it is too strong a statement to make.  Speculation as to what might cause this is fascinating.  Are Muller cells producing progenitors that fail to migrate to the ONL before differentiating into rods?  The lack of BrdU labeling does not support this idea.  Do neurogenic progenitor cells in the INL differentiate towards rods via a pathway that does not require prdm1a?  Perhaps.  Perhaps there are other explanations.

      Reviewer 2: “It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.”

      We agree that bulk RNA sequencing would provide a similar answer, possibly with greater sensitivity.  We chose proteomics for two reasons: 1) We wanted an independent assessment of the knockdown effects that could evaluate whether the knockdowns worked and what pathways were affected.  Since our pathway comparison is to single cell RNAseq data, bulk RNA seq did not seem to be fully independent. 2) Because we used translation-blocking antisense oligos for most knockdown experiments, we did not expect the transcript abundance of the targeted gene to be affected, although these oligos can lead to target transcript degradation.  Thus, we were not likely to be able to validate that our knockdown worked with this technique. 

      Reviewer 3: “The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, …”

      This is certainly true, and the reviewer points to several studies that have made excellent use of this strategy.  Given the 1-2 year timeline to obtain and analyze such data, it is unlikely that we will be able to incorporate such data in our revised manuscript, but we hope to do so for follow-up studies.

      Reviewer 3: “The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression.”

      We have shown in a previous study (Santhanam et al. Cells. 2020;9(10)) that rod degeneration and regeneration are in a steady state from at least 4 to 8 months of age, and in other experiments in the lab at least to 12 months of age.  In this age range, regeneration keeps up with the pace of degeneration, both of which are very fast.  This encompasses the cell types that we specifically study in this manuscript.  The reviewer is right that other cell types could undergo changes.  This is a separate topic of study in the lab.

    1. eLife Assessment

      The authors provide valuable insights into the candidate upstream transcriptional regulatory factors that control the spatiotemporal expression of selector genes and their targets for GABAergic vs glutamatergic neuron fate in the anterior brainstem. The computational analysis of single-cell RNA-seq and single-cell ATAC-seq datasets to predict TF binding combined with cut and tag-seq to find TF binding represents a solid approach to support the findings in the study, although the display and discussion of the datasets could be strengthened. This study will be of interest to neurobiologists who study transcriptional mechanisms of neuronal differentiation.

    2. Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

    1. eLife Assessment

      The study presents a valuable finding on the role of cholesterol-binding sites on GLP-1 receptors although the clinical ramifications are unclear and not eminent at this point. Based on the detailed and persuasive responses provided by authors to the concerns raised by reviewers, the revised manuscript is improved substantially and is convincing enough in its scientific merit. The study is a good addition to the scientific community working on receptor biology and drug development for GLP-1 R.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations, and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented. Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in Figure 2E (which is difficult to interpret anyway because the data are presented in per cent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor. The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      Comments on revisions: The authors have responded well to my criticism.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors were providing a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, by presumably decreasing the cholesterol available to interact with wt GLP-1R. The effects of this mutation are not due to differences in Ex-4:recepotor affinity. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      Weaknesses:

      There are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. Future work by the authors may determine the effects of the GLP-1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. Furthermore, the authors may additionally investigate if V229A would have the same impact in a different cell type, especially in neurons, with implications in the regulation of satiation, gut motility, and especially nausea, which are of high translational interest.

      The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is a novel direction.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. eLife Assessment

      This important study dissects the mathematical and biological assumptions underlying the commonly used Activity-by-Contact model of enhancer action in transcriptional regulation. The authors provide a convincing mathematical analysis that links this (mostly phenomenological) model to concrete molecular mechanisms of enhancer function. This work provides a strong foundation from which to analyze a broad swath of genome-wide data such as that generated by CRISPRi screens.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to formalize the mathematical underpinnings of a proposed general model and discuss the relationship of this model to the ABC Score, a widely adopted heuristic for enhancer-gene predictions. While the ABC model serves as a useful binary classifier, it struggles to predict quantitative enhancer effects on gene expression. Using a graph-theoretic linear framework, the authors derive a mathematical model (the "default model") that explains how the algebraic form of the ABC Score arises under specific assumptions. They further demonstrate that the default model's predictions of enhancer additivity are inconsistent with observed non-additive enhancer effects and propose alternative assumptions to account for these discrepancies.

      Strengths:

      The graph-theoretic approach enables systematic exploration of enhancer interactions beyond simple additivity and enables hypothesis generation when such expectations fail. This work makes clear where assumptions are made and the consequences of those assumptions.

      Weaknesses:

      While the theoretical framework is elegant, I think there is always more space to demonstrate the practicality of this approach. Further guidance for how to experimentally connect this framework with typical measurements could help bolster the immediate benefits. To be clear, I do not think this is something the authors "must" do, but rather something that might help drive home the usefulness in a more accessible way.

    3. Reviewer #2 (Public review):

      Summary:

      The Activity-by-Contact (ABC) model is a relatively widespread model of enhancer-gene regulation. This model leverages CRISPRi data to predict whether a gene is regulated by a given enhancer. To make this possible, this model accounts for the activity of an enhancer and its contact frequency with a target promoter in order to produce an "ABC score". However, while quantitative in its ability to predict enhancer-promoter regulation, this model is mostly phenomenological and does not commit to specific molecular mechanisms.

      In this manuscript, the authors formalize the molecular and mathematical assumptions made by the ABC model. Specifically, they demonstrate a basic set of assumptions that can be made to arrive at the ABC model's mathematical structure. The resulting default model (basically, a null model) places particular emphasis on the requirement that gene activation and enhancer-gene communication must be independent and at a steady state. The authors leverage and extend a graph-based formalism they have previously spearheaded to show the generality of their conclusions with respect to different molecular realizations of the process by which enhancers interact with their promoters.

      Previously published works have found that specific models of how multiple enhancers communicate with the same gene can result in additive mRNA production rates. Here, the authors demonstrate that steady-state mRNA levels are additive regardless of the specific Markovian model for how any individual enhancer communicates with the gene, as long as the model follows the basic assumptions of their default model.

      By coarse-graining, both gene activation and enhancer-gene communication to simple two-state models, the authors then clearly demonstrate that the mathematical structure of the ABC model emerges. This mathematical structure implies that the ABC score summed over all the enhancers regulating a given gene must equal 1. However, experimental measurements show values ranging from 0 to 3. The authors show that, in order to explain these experimental deviations with respect to the theory, at least one of the assumptions of the default model must be broken. They demonstrate that either invoking enhancer cooperativity in mRNA production rates or breaking the assumption that individual enhancers communicate with the gene independently can explain existing experimental data.

      Strengths:

      By demonstrating that the mathematical structure of the ABC model emerges from a set of basic assumptions including the independence of gene activation and enhancer-gene communication, the authors succeeded in their aim to put the ABC model on a formal and molecular footing. Since some experimental results do not agree with the ABC model, the authors importantly demonstrated which assumptions of the model can be broken to explain such data. The theoretical work in this manuscript is written in a reasonably accessible manner that features how a graph theory-based approach to modeling biochemical networks can result in general statements about biological phenomena.

      Weaknesses:

      While the authors discuss a number of experimental techniques that can be used to test the validity of their model, a more specific discussion of proposed experiments could have strengthened the impact of the paper by providing explicit opportunities for dialogue with experimentalists.

    4. Author response:

      We thank both reviewers for their time and effort in considering our manuscript. We are pleased that the reviewers recognised the strength of our theoretical analysis and found it "elegant" and "reasonably accessible". We also acknowledge the suggestions made by both reviewers that the manuscript could be improved by more discussion of potential experiments. We were concerned not to make the original manuscript too long but, in the light of the reviewers' comments, we will submit a revised version with more details of the kinds of experiments that would build on the results that we have presented.

    1. eLife Assessment

      The authors examined the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They showed that viral fitness declines as the virus mutates to escape the immune response and can rebound later in infection as HCV accumulates additional mutations. The study contributes to an important aspect of viral evolution. The combination of approaches contributes to a convincing study.

    2. Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that intrinsic viral fitness declines as the virus mutates to escape T cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes, but also the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      After revision, this paper has no outstanding weaknesses. Points where further investigation is needed have been clearly identified.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. eLife Assessment

      Zhang et al. present important findings that reveal a new role for TET2 in controlling glucose production in the liver, showing that both fasting and a high-fat diet increase TET2 levels, while its absence reduces glucose production. TET2 works with HNF4α to activate the FBP1 gene upon glucagon stimulation, while metformin disrupts TET2-HNF4α interaction, lowering FBP1 levels and improving glucose homeostasis. The results are convincing and expand our understanding of gluconeogenesis regulation.

    2. Reviewer #2 (Public review):

      The manuscript "HNF4α-1 TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes" from Zhang et al. presents significant and convincing findings that enhance our understanding of TET2's role in liver glucose metabolism. It highlights the epigenetic regulation of FBP1, a gluconeogenic gene, by TET2, linking this pathway to HNF4alpha which recruits TET2. The in vitro and in vivo experiments are now well-described and provide convincing evidence of TET2's impact on gluconeogenesis, particularly in fasting and HFD mice.

      Comments on revisions:

      The authors have thoroughly addressed all the concerns raised, and their responses adequately clarify the issues previously identified.

      Minor changes:

      (1) Could the authors provide some comments on why glucagon was not able to stimulate PEPCK and G6Pase mRNA levels in HepG2 cells (Fig. 3D)? Although it is not the focus of the research, it is well known that glucagon has this effect and could serve as a positive control for the quality of the preparation.

      (2) Please include the sequences of the qPCR primers used for PEPCK and G6Pase in the Methods section (page 17).

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. describe a delicate relationship between Tet2 and FBP1 in the regulation of hepatic gluconeogenesis.

      Strengths:

      The studies are very mechanistic, indicating that this interaction occurs via demethylation of HNF4a. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and FBP1.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      The results are briefly described, and oftentimes, the necessary information is not provided to interpret the data. Similarly, the methods section is not well developed to inform the reader about how these experiments were performed. While the findings are interesting, the results section needs to be better developed to increase confidence in the interpretation of the results.

      Thanks very much for pointing out the shortcomings of the manuscript. We apologize that we did not provide detailed description for some experimental methods and results. Following reviewer’s suggestion, we added the details in method section, including the generation of whole-body Tet2 KO mice and liver-specific Tet2 knockdown mice (AAV8-shTet2), the missing information of reagent, antibody, primer sequences and mutant generation, and the methods of chromatin immunoprecipitation (ChIP) and immunofluorescence. The interpretation of the results was also further developed according to reviewer’s comments.

      Reviewer #2 (Public review):

      Summary:

      This study reveals a novel role of TET2 in regulating gluconeogenesis. It shows that fasting and a high-fat diet increase TET2 expression in mice, and TET2 knockout reduces glucose production. The findings highlight that TET2 positively regulates FBP1, a key enzyme in gluconeogenesis, by interacting with HNF4α to demethylate the FBP1 promoter in response to glucagon. Additionally, metformin reduces FBP1 expression by preventing TET2-HNF4α interaction. This identifies an HNF4α-TET2-FBP1 axis as a potential target for T2D treatment.

      Strengths:

      The authors use several methods in vivo (PTT, GTT, and ITT in fasted and HFD mice; and KO mice) and in vitro (in HepG2 and primary hepatocytes) to support the existence of the HNF4alpha-TET-2-FBP-1 axis in the control of gluconeogenesis. These findings uncovered a previously unknown function of TET2 in gluconeogenesis.

      We are grateful for the reviewer's praise on the manuscript.

      Weaknesses:

      Although the authors provide evidence of an HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, which contributes to the therapeutic effect of metformin on T2D, its role in the pathogenesis of T2D is less clear. The mechanisms by which TET2 is up-regulated by glucagon should be more explored.

      Thanks very much for pointing out the shortcomings of the manuscript. We agree with the reviewer that the manuscript is focused on the function of HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, but not on its role in the pathogenesis of T2D. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”. For the mechanisms by which TET2 is up-regulated by glucagon, we examined TET2 mRNA levels at different time points after a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):<br /> The authors indicate that they have overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The degree of overexpression should be shown. Is this similar to an increase in TET2 with fasting or HFD treatment?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the protein levels of overexpressed TET2 in HepG2 cells and primary mouse hepatocytes. The results revealed that the degree of TET2 overexpression (refer to Fig. 3J) is similar to the increase of TET2 under fasting or HFD treatment (Fig. 1C, D).

      In Figures 2E-2G, the authors report results in Tet2-KO mice. Information on how these mice were generated is lacking. There is limited information about how Tet2-KO cells were generated, but again, I could not find anything about these mice in the methods section or figure legend. Is this whole-body or liver-specific Tet2-KO? How old were the mice at the time of PTT, GTT, or ITT?

      Were these mice on chow or HFD? Are there any differences in body weight between WT and Tet2-KO mice?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we provided the detailed information about the Tet2-KO mice, including the mouse generation in methods section. Moreover, the details of Tet2-KO mice used in each figure were clearly described in the figure legend. In this study, two mouse models were employed: whole-body Tet2-KO mice and liver-specific TET2 knockdown mice (AAV8-shTet2). The mice used for PTT, GTT and ITT were 8 weeks old and on HFD. To address reviewer’s concern, we compared the body weight of WT and Tet2-KO mice and results revealed that no significant differences in the body weight between WT and Tet2-KO mice at 8 and 10 weeks old when on a normal chow diet, as depicted in Figure 2I.

      Figures 3A-C shows that 48 hours after glucagon treatment, Tet2 and FBP1 mRNA increased. It's surprising that a single dose of glucagon would have effects that last that long. The peak rise in glucose following glucagon treatment occurs in 30 minutes. How do authors explain such a long effect of glucagon on Tet2 mRNA and protein?

      Thanks for reviewer’s constructive comment. To address reviewer’s concern, we examined the mRNA levels of TET2 and FBP1 at different time points following a single dose of glucagon treatment in HepG2 cells. Interestingly, the results showed that TET2 mRNA levels significantly increased by 6 folds at 30 min and the sustained effect of glucagon on Tet2 mRNA levels persisted for more than 48 hours (refer to Fig. 3E). The detailed mechanism underlying long effect of glucagon on Tet2 mRNA and protein needs further exploration.

      It's interesting that in Figure 3F, Fbp1 and Tet2 mRNA expression correlated positively in both ad libitum and fasting conditions. I would expect that during fed conditions, gluconeogenesis would not be activated and thus would expect no correlation.

      Thanks for reviewer’s constructive comment. According to the results in new Fig. 3H, the mRNA levels of Fbp1 and Tet2 indeed positively correlated in both ad libitum and fasting conditions, while the r value is higher and p value is lower in fasting condition compared to ad libitum. Notably, both the expression levels of Fbp1 and Tet2 increased under fasting treatment, which is consistent with Fig. 1C and Fig. 4K.

      The authors state that "Our results demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation" What data points out that this is mediated through demethylation?

      Thanks for reviewer’s constructive comment. Following reviewer’s suggestion, we conducted new ChIP experiments. These data demonstrated that HNF4α recruits TET2 to the FBP1 promoter and activates FBP1 expression through demethylation, as showed in Fig. 4F-H.

      For Figures 5B, 4D, and 3L-N y-axes are labeled as fold enrichment. The authors should clearly indicate what was being measured on y-axes.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we clearly labeled all the y-axes in each figure.

      The authors indicate that metformin increases phosphorylation of Hnf4a at ser 313 Figure 5C. How do we know that ser 313 is involved? Only one antibody is listed for Hnf4a (SAB, 32591).

      Thanks very much for pointing out. We determined the phosphorylation levels of HNF4α at S313 using Anti-HNF4α (phospho S313) (ab78356), we apologize for not labeling it clearly. Now, we made it clear in Fig. 5C and the detailed information of the antibody was added to the method section of “Western Blot and Immunoprecipitation”.

      How did the authors make phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α? This is not described.

      Thanks very much for pointing out. Following reviewer’s suggestion, the detailed method for making phosphomimetic mutation (S313D) and phosphoresistant mutation (S313A) of HNF4α was added to the method section of “Gene Knockout Cells and Mutagenesis”.

      Reviewer #2 (Recommendations for the authors):

      Major points:

      (1) Other key gluconeogenesis genes (e.g. PEPCK and G6Pase) should have been investigated to demonstrate whether or not the regulation of TET-2 is specific on FBP-1.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we designed the qPCR to assay other key gluconeogenesis genes, including PEPCK and G6Pase, and the results showed that glucagon treatment had no effect on PEPCK and G6Pase expression (Fig. 3D), suggesting the regulation of TET2 is specific on FBP1.

      (2) The methods are not well defined and more details should be given, for example, to explain how the Tet2 KO mice were generated. Since these animals are not KO liver-specific and TET2 is expressed in a variety of tissues and organs and is predominantly found in hematopoietic cells, including bone marrow and blood cells, the phenotype of these mice should be better characterized.

      Thanks for reviewer’s helpful comment. The Tet2 knockout (Tet2 KO) mice were originally purchased from the Jackson Laboratory (strain No. 023359) and we added the detailed information to method section of “Animal”. According to the previously reported phenotype of Tet2 KO mice, it mainly includes bone marrow, spleen, islet and heart. Specifically, Tet2 KO mice led to an increase of total cell numbers in the bone marrow and spleen (PMID: 21873190), as well as an elevated white blood cell (WBC) count (PMID: 37541212). Additionally, Tet2 KO mice exhibited splenomegaly (PMID: 37541212, PMID: 21723200, PMID: 38773071, PMID: 21723200). And the morphology of the islets (PMID: 34417463), anatomical chamber volumes or ventricular functions (PMID: 38357791) were indistinguishable between the Tet2 KO and wild type (WT) mice.

      (3) An experiment showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and/or in HFD-mice would strengthen the data shown in Figure 3.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the experiments showing the co-localization of TET2 and HNF4α in the mouse liver in fasted mice and FD mice were conducted, as shown in new Fig. 4B and C.

      Minor points:

      (1) Given that the manuscript does not focus on the role of TET2 in the pathogenesis of T2D, its title should be changed.

      hanks for reviewer’s helpful comment. Following reviewer’s suggestion, we changed the title of the manuscript to “HNF4α-TET2-FBP1 axis contributes to gluconeogenesis and type 2 diabetes”.

      (2) Please indicate the molecular weight of bands in all figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the molecular weight of bands was indicated in all figures.

      (3) Why do the control values of the y-axis in Figure 1 A and B are so different? Please maintain the same scale in both figures.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we recalculated and normalized the control value in Fig. 1A to maintain the same scale in both figures.

      (4) In Figure 2F, do the plasma insulin levels have altered in response to GTT in Tet2-KO mice? If so, please show the data and discuss.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined the plasma insulin levels in the process of GTT assay, and the result revealed that Tet2-KO mice showed lower insulin levels after glucose administration, which reflects higher insulin sensitivity, as shown in new Fig. 2H.

      (5) The increase of TET2 hepatic protein levels in response to fasting occur in other tissues and hematopoietic cells?

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, we examined Tet2 protein levels under fasting condition in other tissues and hematopoietic cells, and found that fasting also increased Tet2 protein levels in kidney, brain, and hematopoietic cells, but not in heart.

      Author response image 1.

      (6) Please indicate the glucagon concentration and metformin dose in all figures in which they are mentioned.

      Thanks for reviewer’s helpful comment. Following reviewer’s suggestion, the glucagon concentration (20 nM) and metformin concentration (10 mM for HepG2 cell treatment and 300 mg/kg per day for mice treatment) were added in the figure legends, respectively.

    1. eLife Assessment

      This valuable paper describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45, which is essential for the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin. Although the results shown in the paper are of interest to researchers in DNA replication and genome stability, the biochemical analysis of protein-protein interaction and DNA binding is incomplete, and the paper needs additional data.

    2. Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a significant contribution that enhances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides crucial insights into the intermediate steps of CMG formation, and the particle analysis and model predictions compellingly describe the mechanism of Cdc45 loading. Building upon previously known Sld3 and Cdc45 structures, this study offers new perspectives on how Cdc45 is recruited to MCM DH through the Sld3-Sld7 complex. The most notable finding is the structural rearrangement of Sld3CBD upon Cdc45 binding, particularly the α8-helix conformation, which is essential for Cdc45 interaction and may also be relevant to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a potential mechanism for its binding to Mcm2NTD. Furthermore, Sld3's ssDNA-binding experiments provide evidence of its novel functions in the DNA replication process in yeast, expanding our understanding of its role beyond Cdc45 recruitment.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research. This research also opens up several new opportunities to utilize structural biology to unravel the molecular details of the model presented in the paper.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of detailed structural validation for the proposed Sld3-Sld7-Cdc45 model, and its CMG bound models, which could be done in the future using advanced structural biology techniques such as single particle cryo-electron microscopy. It would also be interesting to explore how Sld7 interacts with the MCM helicase, and this would help to build a detailed long-flexible model of Sld3-Sld7-Cdc45 binding to MCM DH and to show where Sld7 will lie on the structure. This will help us to understand how Sld7 functions in the complex. Also, future experiments would be needed to understand the molecular details of how Sld3 and Sld7 release from CMG is associated with ssARS1 binding.

    3. Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. Although the single-stranded DNA binding data from Sld3 of different species is a minor weakness, the experiments support a model in which the release of Sld3 from the complex may be promoted by its binding to origin single-stranded DNA exposed by the helicase.

      Strengths

      • The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.<br /> • The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.<br /> • The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.<br /> • The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.<br /> • The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.<br /> • The proposed model of Sld3 release from the complex through binding to single stranded DNA at the origin is intriguing.

      Weaknesses

      • The section on the binding of Sld3 complexes to origin single-stranded DNA is somewhat weakened by the use of Sld3 proteins from different species. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.<br /> • Although the study reveals that Sld3 binds to different residues of Cdc45 than those previously shown to bind Mcm or GINS, the data in the paper do not shed any additional light on how GINS and Sld3 binding to Cdc45 or Mcms. would affect each other. Other previous research has suggested that the binding of GINS and Sld3 to Mcm or Cdc45 may be mutually exclusive. The authors acknowledge that a structural investigation of Sld3, Sld7, Cdc45, and MCM during the stage of GINS recruitment will be a significant goal for future research.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion even in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a novel contribution that significantly advances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides insights into the intermediate steps of CMG formation. The study builds upon previously known structures of Sld3 and Cdc45 and offers new perspectives into how Cdc45 is loaded onto MCM DH through Sld3-Sld7. The most notable finding is the structural difference in Sld3CBD when bound to Cdc45, particularly the arrangement of the α8-helix, which is essential for Cdc45 binding and may also pertain to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a possible mechanism for its binding to MCM2NTD.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of experimental validation for the proposed Sld3-Sld7-Cdc45 model. Specifically, the claim that Sld3 binding to Cdc45-MCM does not inhibit GINS binding, a finding that contradicts previous research, is not sufficiently substantiated with experimental evidence. To strengthen their model, the authors must provide additional experimental data to support this mechanism. Also, the authors have not compared the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to see if Sld3/Treslin does not cause any clash with the GINS when bound to the CMG. Still, the work holds great potential in its current form but requires further experiments to confirm the authors' conclusions.

      We appreciate the reviewers’ careful reading and the comments.

      Our structural analysis of Sld3CBD-Cdc45 showed the detailed interaction map between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3, MCM and GINS binding sites of Cdc45 completely differed, suggesting that the Sld3CBD, Cdc45 and GINS could bind to MCM together. The SCMG-DNA model confirmed such a binding manner, although our study does not show how this binding manner affects the GINS loading by other initiation factors (Dpb11, Sld2, et. al). Regarding the previous studies, competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM (Bruck et. al), which may be caused by the conformation change of Cdc45 DHHA1 between Sld3CBD-Cdc45 and CMG. We modified our manuscript and discussed (P7/L168-173, and P10/L282-286). Following the comment, we checked the recently published Cryo-EM structure (PDBID:8Q6O) with their predicted models of the metazoan CMG helicases (P7/L198-P8/L202) and added the Cdc45 mutation experiments to confirm our conclusion ([Recommendations for the authors] Q18).

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. However, the work remains incomplete as several main claims are only partially supported by experimental data, particularly the proposed model for Sld3 interaction with GINS on the CMG. Additionally, the single-stranded DNA binding data from different species do not convincingly advance the manuscript's central arguments.

      Strengths

      (1) The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      (2) The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      (3) The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      (4) The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      (5) The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      Weaknesses

      (1) The proposed model for Sld3 interacting with GINS on the CMG needs more experimental validation and conflicts with published findings. These discrepancies need more detailed discussion and exploration.

      Our structural analysis experiment of Sld3CBD-Cdc45 showed the detailed interaction information between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3CBD-binding site of Cdc45 is completely different from that of GINS and MCM binding to Cdc45, suggesting that the Sld3CBD, Cdc45, and GINS could bind to MCM together. The SCMG-DNA model confirmed such a binding manner. Following the comment, we added a Cdc45 mutant analysis, disrupting the binding to MCM and GINS but not affecting the Sld3CBD binding (Supplementary Figure 9). Our model is consistent with the GINS-loading requirement (the phosphorylation of Sld3 on Cdc45-MCM) and has no discrepancies with the stepwise loading fashion (Please see the responses to [Recommendations for the authors] Reviewer#1-Q14-15]). Regarding the previous studies, competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM (Bruck et. al), by in vitro binding experiments, please see the responses to [Recommendations for the authors] Q6.

      (2) The section on the binding of Sld3 complexes to origin single-stranded DNA needs significant improvement. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      As suggested, we tried to improve the ssDNA-binding section (Please see the responses to [Recommendations for the authors]: Q4 and Q5). We used Sld7-Sld3CBD-Cdc45 from different sources due to limitations in protein expression. These two sources belong to the same family and the proteins Sld7, Sld3 and Cdc45 have sequence conservation with similar structures predicted by the alphafold3 (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45). Such similarity in source and protein lever allows us to do the comparison.

      (3) The authors' model proposing the release of Sld3 from CMG based on its binding to single-stranded DNA is unclear and needs more elaboration.

      Considering that ssDNA (ssARS1) is produced by CMG, the ssDNA-binding of Sld3 should happen after forming an active CMG. Therefore, the results of ssDNA binding experiments implied that the Sld3 release could be with the binding to ssDNA produced by CMG. We tried to present more elaborations in the revised version. (Please see the responses to [Recommendations for the authors] Q4, Q5).

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion.

      We thank your positive assessment. We provided more quantitative information and tried to quantify the experiments as suggested (Please see the responses to [Recommendations for the authors]).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have several concerns that I will outline below, accompanied by my suggestions.

      (1) "The title of the paper- "Structural and functional insights into Cdc45 recruitment by Sld7-Sld3 for CMG complex Formation," appears misleading because it appears that authors present a structure of Sld3-Sld7 in complex with Cdc45, which is not the case here. If authors can provide additional structures proving the function of this complex, then this title justifies it. Otherwise, I recommend making a title that justifies the presented work in its current form.

      Following the comment, we change the title to “Sld3CBD-Cdc45 structural insights into Cdc45 recruitment for CMG complex formation”.

      (2) In lines 70-72, where the authors mention the known structures of different proteins, intermediates, and complexes, I recommend including PDB IDs of the described structures and reference citations. This will help the readers to analyze what is missing in the pathway and why this structure is essential.

      Following the comment, we added PBDIDs and references (P3/L72-74).

      (3) The representation of Figure 1A is unclear and looks clumsy. If the structure were rotated in another orientation, where α8 and α9 would be displayed on the forward side, it would be more helpful to understand the complex forming regions by looking at the structure. Also, I recommend highlighting the α8 and α9 in a contrasting color to be easily visible and attract readers' attention. Similarly, it would also be helpful if DHAA1 would be shown in a different color.

      Following the comment, we modified the Figure1 to show α8 and α9 of Sld3CBD and DHAA1 of Cdc45 clearly in revised version.

      (4) Can authors add a supplementary figure showing the probability of disorderness of the α8 helix region in the Sld3? Also, highlight what region became ordered in their structure.

      Yes, we have showed the disordered α8 helix region and highlight ordered α8 in the Sld3 in Figure S4 A.

      (5) Can you compare the Cdc45 long distorted helix (Supplementary Figure 3B) in the Sld3-Cdc45 complex with the Xenoupus and drosophila Cdc45 from their CMG structures? Also, can the authors explain why this helix is destabilized in their structure but is relatively stable in another Cdc45 structure (in CMG and HuCdc45)?

      We have checked all Cdc45 from published cryo-EM CMG structures, including Xenopus CMG-donson (8Q6O) and Drosophila CMG (6RAW), and all of them ordered the long helix in the CMG complex, whereas this long helix was disordered in the crystal structure of Sld3CBD-Cdc45 and Entamoeba histolytica Cdc45. The crystal packing around the long helix showed that it looks to be stabilized by crystal packing only in huCdc45, therefore we suggested that this long helix is detestable for crystallization.

      (6) I recommend adding the following parameters to Supplementary Table 2: 1. Rmerge values, 2. Wilson B factor, 3. Average B factor, and 4. Total number of molecules in ASU.

      We are sorry to make a mistake about Rmerge in Table 2. We correct it. We added the Wilson B factor, the average B factor, and the total number of Sld3CBD-Cde45 in ASU.

      (7) Can authors provide the B factor values of the α8 helix of Sld3?

      We checked the B factor values of the helix α8CTP of Sld3 in Sld3CBD-Cdc45. Since this helix binds to Cdc45 stably, the average B factor of the main chain is 45 Å<sup>2</sup> less than that of the whole structure. We added the average B factor of helix α8CTP into the Supplementary Figure 4A legend.

      (8) Can authors explain why higher Ramachandran outliers exist in their structure? Can it be reduced below 1% during refinement?

      There are 13 outliers (1.67%) in different places: four are close to the disorder regions (poor electron map), four are in a loop with poor map and the remains are turn parts or a loop. For the residues with poor electron maps, we could not modify them to the allow Ramachandran region with low Rfree value, so we could not reduce them to below 1% during refinement while keeping the current Rfree value.

      (9) In Supplementary Figure 8, please show the CD spectra of the Sld3WT. Why is the Sld3-3S peak relatively flat? Was the sample precipitating while doing the measurements, or does it have less concentration than others?

      To check the folding of the mutants, we did CD experiments with the estimated secondary structure elements. Because WT Sld3CBD was prepared in a complex with Cdc45, while the mutants of Sld3CBD existed along, we calculated the elements of secondary structure from the crystal structure of Sld3CBD-Cdc45. The concentration of samples was controlled to the same level for CD measurement. The relative plat of the Sld3-3S peak may be caused by precipitating while doing the measurement.

      (10) Can authors generate the alpha fold three models of the Sld3CBD-Cdc45-MCM-dsDNA and SCMG-dsDNA and compare them with the models they have generated?

      We tried to predict the Sld3CBD-Cdc45-MCM-dsDNA and SCMG-dsDNA using Alphafold3. Although the results showed similar structures to our models, many parts were disordered. So, we did not use the predicted structures.

      (11) The authors say that the overall molecular mass of the Sld7-Sld3ΔC-Cdc45 was >400kDa on the SEC column. However, the column used for purifying this complex and the standards that were run on it for molecular weight calculations have not been written anywhere. If the Superdex 200 column was used, then the sample of more than 400kDa should not elute at the position shown in Supplementary Figure 2B. I recommend showing the standard MW plot and where the elution volume of the Sld7-Sld3ΔC-Cdc45 lies on the standard curve. Also, add how molecular weight calculations were done and the calculated molecular mass.

      Following the comment, we added a measurement of Superdex 200 16/60 column (SEC) using a standard sample kit into Supplementary Figure 2 to show that the molecular weight of the peak at the position was estimated to be > 400 k Da.

      (12) I also recommend using at least one of the techniques, either SEC-MALS or AUC, to calculate the actual molecular mass of the Sld7-Sld3ΔC-Cdc45 complex and to find its oligomeric state. If the authors want to prove their hypothesis that a dimer of this complex binds to MCMDH, it is essential to show that it exists as a dimer. Based on the current SEC profile, it appears as a monomer peak if the S200 SEC column is being used.

      As the response to (11), we added the standard MW plot (measurement using Superdex 200 16/60 column) using a standard sample kit. The molecular weight at the peak elution position of Sld7-Sld3ΔC-Cdc45 was estimated to be 429k Da. Considering that the Sld7-Sld3ΔC-Cdc45 dimer should be a flexible long-shaped molecule, the elution position could be at a larger molecular weight position than the real one (158 x 2 k Da). We also tried to confirm the particle size using SEC-SAXS, as the response to the next question (13).

      (13) Dynamic light scattering is not the most accurate method for calculating intermolecular distance. I recommend using another technique that calculates the accurate molecular distances between two Cdc45 if Sld7-Sld3ΔC-Cdc45 is forming a dimer. Techniques such as FRET could be used. Otherwise, some complementary methods, such as SAXS, could also be used to generate a low-resolution envelope and fit the speculated dimer model inside, or authors could try negative staining the purified Sld7-Sld3ΔC-Cdc45 and generate 2D class averages and low-resolution ab initio models to see how the structure of this complex appears and whether it satisfies the speculated model of the dimeric complex.

      We have tried both negative staining TEM and SEC-SAXS experiments. We could not obtain images good enough of negative staining of TEM to generate 2D class averages and low-resolution ab initio models. The results of SEC-SAXS provided a molecular weight of 370 - 420 kDa, and an Rg > 85 Å, which are consistent with our conclusion from SEC and DLS results but with large error due to the measurement temperature at 10-15°C (measuring equipment limitation). The peak of SCE-SAXS under measurement conditions was not as sharp as purification at 4°C and SAXS data is not good enough to make a molecular model, so we did not add them to our manuscript.

      (14) Authors mentioned in the introduction section (lines 72-73) that based on the single-molecule experiments, Cdc45 is recruited in a stepwise manner to MCMDH. If this is true and if Sld7-Sld3ΔC-Cdc45 forms a dimer, this is also true, then for stepwise recruitment, the dimer will have to break into monomers, and this will be an energy-expensive process for the cell. So, would such a process occur physiologically? Can the authors explain how this would physiologically happen inside the cell?

      Sld7-Sld3-Cdc45 consists of domains linked by long loops, so the dimer Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 is flexible long-sharp. Such a flexible dimer does not mean that two Cdc45 molecules must bind to MCM DH simultaneously and may bind to MCM DH by stepwise manner. The dimer formation of Sld7-Sld3-Cdc45 is advantageous for recruiting efficiently and saving energy. Moreover, our proposal of Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 on MCM DH could be a stage during CMG formation in the cell. Following the comment, we added such descriptions (P7/L194, and P10/L276-279).

      (15) Can authors show experimentally that a dimer of Sld7-Sld3ΔC-Cdc45 is binding to MCMDH and not a monomer in a stepwise fashion?

      In our study, we provided experiments of particle size to show the dimer of Sld7-Sld3-Cdc45 off MCM DH and a model of SCMG to indicate the dimer of Sld7-Sld3ΔC-Cdc45 on MCM DH. This question should be addressed future by the Cryo-EM of Sld7-Sld3-Cdc45-MCM DH or Sld7-Sld3-CMG. As the response to Q14, the flexible dimer of Sld7-Sld3ΔC-Cdc45 binding on MCMDH does not contradict the stepwise-loading fashion. The dimer of Sld7-Sld3ΔC-Cdc45 binding on MCM DH shows a stage.

      (16) Can authors highlight where Sld7 will lie on their model shown in Figures 3A and 3C, considering their model shown in 3B is true?

      We predict that the Sld7-Sld3-Cdc45 should be in a dimer form of Cdc45-Sld3-[Sld7]2-Sld3-Cdc45 based on the structures and the particle size analysis. The Sld7 dimer could be across MCM DH on the top of Figure 3A right and 3C right. However, we could not add the Sld7 molecule to the models because there is no interaction data between Sld7 and MCM.

      (17) In Supplementary Figure 10, can authors show the residues between the loop region highlighted in the dotted circle to show that there is no steric clash between the residues in that region of their predicted model?

      Following the comment, we added the residues in Supplementary Figure 10 (Supplementary Figure 11 in the revised version) to show no steric clash in our predicted model.

      (18) It is essential to show experimentally that Sld3CBD neighbors MCM2 and binds Cdc45 on the opposite side of the GINS binding site. I recommend that the authors design an experiment that proves this statement. Mutagenesis experiments for the predicted residues that could be involved in interaction with proper controls might help to prove this point. Since this is the overall crux of the paper, it has to be demonstrated experimentally.

      We thank the reviewer’s recommendation. Our structural analysis experiment shows the interaction information between Sld3CBD and Cdc45 at 2.6 Å resolution. The Sld3CBD-binding site, GINS-binding site, and MCM-binding site of Cdc45 are completely different, indicating that the Sld3CBD, Cdc45 and GINS could bind to MCM together. The SCMG model confirmed such a binding manner. Following the recommendation, we added mutant analysis of Cdc45 G367D and W481R, which was reported to disrupt the binding to MCM and GINS, respectively. Both mutants do not affect the binging to Sld3CBD as we predicted (Supplementary Figure 9B). We modified our manuscript and discussed this point more clearly (P7/L170-173).

      (19) I recommend rewriting the sentence in lines 208-210. During EMSA experiments, new bands do not appear; instead, there is no shift at lower ratios, so you see a band similar to the control for Sld3CBD-Cdc45. So, re-write the sentence correctly to avoid confusion when interpreting the result.

      Following the comment, we rewrote this sentence to "The ssDNA band remained (Figure 4B) and new bands corresponding to the ssDNA–protein complex appeared in CBB staining PAGE (Supplementary Figures 13) when the Sld3CBD–Cdc45 complex was mixed with ssDNA at the same ratio, indicating that the binding affinity of Sld3CBD–Cdc45 for ssDNA was lower than that of Sld3CBD alone” (P8/L226-229)

      (20) Since CDK-mediated phosphorylation of Sld3 is known to be required for GINS loading, the ssDNA binding affinity of phosphorylated Sld3 remains the same. I wonder what would happen if phosphorylated Sld3 were used for the experiment shown in Figure 4B.

      The CDK phosphorylation site is located at Sld3CTD and our ssDNA-binding experiment did not include the Sld3CTD, so phosphorylated Sld3 does not affect the results shown in Figure 4B.

      (21) Sld3CBD-Cdc45 has a reduced binding affinity for ss DNA, and Sld7-Sld3ΔC-Cdc45 and Sl7-Sld3ΔC have a similar binding affinity to Sld3CBD based on figure 4B. It appears that Sld3CBD reduces the DNA binding affinity of CDC45 or vice versa. Is it correct to say so?

      Our opinion is “vice versa”. Cdc45 reduces the ssDNA-binding affinity of Sld3CBD. Although we could not point out the ssDNA-binding sites of Sld3CBD, the surface charge of Sld3CBD implies that α8CTP could contribute to ssDNA-binding (Supplementary Figures 15).

      (22) Cdc45 binds to the ssDNA by itself, but in the case of Sld3CBD-Cdc45, the binding affinity is reduced for Sld3CBD and Cdc45. Based on their structure, can authors explain what leads to this complex's reduced binding affinity to the ssDNA? Including a figure showing how Sld7-Sld3CBD-Cdc45 interacts with the DNA would be a nice idea.

      Previous studies showed that Cdc45 binds tighter to long ssDNA (> 60 bases) and the C-terminus of Cdc45 is responsible for the ssDNA binding activity. The structure of Sld3CBD-Cdc45 shows the C-terminal domain DHHA1 of Cdc45 binds to Sld3CBD, which may lead to Sld3CBD-Cdc45 complex reduced ssDNA-binding affinity of Cdc45. We agree that showing a figure of how Sld7-Sld3CBD-Cdc45 interacts with ssDNA is a nice idea. However, there is no detailed interaction information between Sld7-Sld3Δ-Cdc45 and ssDNA, so we could not give a figure to show the ssDNA-binding manner. We added a figure to show the surface charges of Sld3CBD of Sld3CBD-Cdc45, and Sld3NTD-Sld7NTD, respectively (Supplemental Figure 15).

      (23) Based on the predicted model of Sld7-Sld3 and Cdc45 complex, can authors explain how Sld7 would restore the DNA binding ability of the Sld3CBD?

      It can be considered that Sld7 and Sld3NTD could bind ssDNA. Although we did not perform the ssDNA-binding assay of Sld7, the Sld3NTD-Sld7NTD surface shows a large positive charge area which may contribute to ssDNA-binding (Supplemental Figure 15). We added the explanation (P9/L245-248).

      (24) It would be important to show binding measurements and Kd values of all the different complexes shown in Figure 4B with ssDNA to explain the dissociation of Cdc45 from Sld7-Sld3 after the CMG formation. I also recommend describing the statement from lines 224-227 more clearly how Sld7-Sld3-Cdc45 is loading Cdc45 on CMG.

      As the reviewer mentioned, the binding measurements and Kd of values of all the different complexes are important to explain the dissociation of Sld7-Sld3 from CMG. The pull-down assay using chromatography may be affected by balancing the binding affinity and chromatography conditions. Therefore, we used EMSA with native-PAGE, which is closest to the natural state. However, the disadvantage is that the Kd values could not be estimated. For lines 224-227, the ssARS1-binding affinity of Sld3 and its complex should relate to the dissociation of Sld7–Sld3 from the CMG complex but not Cdc45 loading, because ssARS1 is unwound from dsDNA by the CMG complex after Cdc45 and GINS loading. We modified the description (P9/L248-251).

      (25) Can authors explain why SDS-PAGE was used to assess the ssDNA (See line 420)?

      We are sorry for making this mistake and corrected it to “polyacrylamide gel electrophoresis”.

      (26) In line 421, can the authors elaborate on a TMK buffer?

      We are sorry for this omission and added the content of the TMK buffer (P16/L453).

      (27) I am curious to know if the authors also attempted to Crystallize the Sld7-Sld3CBD-Cdc45 complex. This complex structure would support the authors' hypothesis in this article.

      We tried to crystallize Sld7-Sld3Δ-Cdc45 but could not get crystals. We also tried using cryo-EM but failed to obtain data.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript would be strengthened if the authors acknowledged in greater detail how their work agrees with or disagrees with Itou et al. (PMID: 25126958 DOI: 10.1016/j.str.2014.07.001). The introduction insufficiently described the findings of that previous work in lines 63-64.

      We compared Sld3CBD in Sld3CBD-Cdc45 to the monomer reported by Itou et al. (PMID: 25126958 DOI: 10.1016/j.str.2014.07.001) in the section of [The overall structure of Sld3CBD-Cdc45] and point out the structural similarity and difference (P5/L105-106), especially, conformation change of Sld3CBD α8 for binding to Cdcd45, which agrees to the mutant experiments of Itou et al., (P3/L126-127). Another Cdc45-binding site of Sld3CBD in the Sld3CBD-Cdc45 complex is α9 not residues predicted in previous studies.

      (2) Figure 2. Could you please perform and present data from multiple biological replicates (e.g., at least two independent experiments) for each mutant strain? This would help ensure that the observed pull-downs (2A-B) and growth patterns (2C) are consistent and reproducible.

      We have done pull-downs three times from co-expression to purification and pull-down assay. We added descriptions to the method of [Mutant analysis of Sld3 and Cdc45]. The growth patterns are two times in Figure 2C.

      (3) Figure 3B. The match between the predicted complex length and particle size measured by dynamic light scattering (DLS) is striking. Did the authors run the analysis with vehicle controls and particle size standards? There is no mention of these controls.

      Following the comment, we added the control data of buffer and standard protein lysozyme, and the descriptions to the method of [Dynamic light scattering].

      (4) Figure 4. In lines 216-217, the authors write that the binding of the K. marxianus complex "demonstrates that the presence of Sld7 could restore the single-stranded DNA binding capacity of Sld3." Another explanation is that complexes from each species bind differently. If the authors want to make a strong claim, they should compare the binding of complexes containing the same proteins.

      Agree with the comment, to make a strong claim using samples from the same source is better. Due to limitations in protein overexpression, we used Sld7-Sld3ΔC-Cdc45 from different sources two sources belong to the identical family (Saccharomycetaceae) and the proteins Sld7, Sld3 and Cdc45 have sequence conservation with similar structures (RMSD = 0.356, 1.392, and 0.891 for Ca atoms of Sld7CTD, Sld7NTD-Sld3NTD, and Sld3CBD-Cdc45) predicted by the alphafold3. Such similarity in source and protein level allows us to do the comparison. Moreover, we modified the description to “indicates that the presence of Sld7 and Sld3NTD could increase the ssDNA-binding affinity to a level comparable to that of Sld3CBD.

      (5) The logic of the following is unclear: "Considering that ssDNA is unwound from dsDNA by the helicase CMG complex, Sld7-Sld3ΔC-Cdc45, and Sld7-Sld3C having a stronger ssDNA-binding capacity than Sld3CBD-Cdc45 may imply a relationship between the dissociation of Sld7-Sld3 from the CMG complex and binding to ssDNA unwound by CMG." (Lines 224-227). How do the authors imagine that the binding affinity difference due to Sld7 contributes to the release of Sld3? Please explain.

      Considering that ssARS1 is unwound from dsARS1 by the activated helicase CMG complex formed after loading Cdc45 and GINS, Sld3–Sld7 having a stronger ssARS1-binding affinity may provide an advantage for the dissociation of Sld7–Sld3 from the CMG complex. We modified the sentence of Lines 224-227 (P9/L248-251).

      (6) The authors suggest that the release of Sld3 from the helicase is related to its association with single-stranded ARS1 DNA. They refer to the work of Bruck et al. (doi: 10.1074/jbc.M111.226332), which demonstrates that single-stranded origin DNA inhibits the interaction between Sld3 and MCM2-7 in vitro. The authors selectively choose data from this previous work, only including data that supports their model while disregarding other data. This approach hinders progress in the field. Specifically, Bruck proposed a model in which the association of Sld3 and GINS with MCM2-7 is mutually exclusive, explaining how Sld3 is released upon CMG assembly. In Figure 3 of the authors' model, they suggest that Sld3 can associate with MCM2-7 through CDC45, even when GINS is bound. Furthermore, Bruck's work showed that ssARS1-2 does not disrupt the Sld3-Cdc45 interaction. Instead, Bruck's data demonstrated that ssARS1-2 disrupts the interaction between MCM2-7 and Sld3 without Cdc45. While we do not expect the authors to consider all data in the literature when formulating a model, we urge them to acknowledge and discuss other critical data that challenges their model. Additionally, it would be beneficial for the field if the authors include both modes of Sld3 interaction with MCM2-7 (i.e., directly with MCM or through CDC45) when proposing a model for how CMG assembly and Sld3 release occurs.

      In our discussion, we referred to the studies of Bruck’s data (doi: 10.1074/jbc.M111.226332) but did not discuss more because we didn’t perform similar experiments in vitro, and we do not think that no discussion hinders progress in the field. Promoting research progress, the new experiment should provide a new proposal and updated knowledge. Although we do not know exactly the positional relationship between Sld3 and Dpb11-Sld2 on MCM during GINS recruiting, the Sld3CBD-Cdc45 structure shows clearly that the Sld3CBD-binding site of Cdc45 is completely different from that of GINS and MCM binding to Cdc45. The model SCMG confirmed such a binding manner, Sld3, Cdc45 and GINS could bind together. The competition of Sld3 and GINS for binding to Cdc45 or Cdc45-MCM reported by Bruck et. al, may be caused by the conformation change of Cdc45 DHHA1 between Sld3CBD-Cdc45 and CMG, or without other initiation factors (CMG formation is regulated by the initial factors). We modified the discussion (P10/L282-286). Regarding ssARS1-binding, we did not discuss with Bruck's data that ARS1-2 does not disrupt the Sld3-Cdc45 interaction, because the data does not conflict with our proposal, although the data does not have an advantage. We propose that the release of Sld3 and Sld7 from CMG could be associated with the binding of ssARS1 unwound by CMG, but the dissociation event of Sl3-Sld7 doesn’t only ssARS1-binding. The exploration of unwound-ssARS1 causes the conformation change of CMG, which may be another event for Sld3-Sld7 dissociation. However, we do not have more experiments to confirm this and Bruck’s ssDNA-binding experiment did not use all of Sld3, Cdc45 and MCM, so we do not discuss more with Bruck’ data in the revised version (P11/L303-305).,

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Figure 1, Sld3CBD-Cdc45 complex: Please indicate the number of critical residues and those of alpha-helixes and beta-sheets in this Figure or Supplemental Figure to confirm the authors' claim.

      Following the comment, we added the number of alpha-helixes and beta-sheets with residue numbers in Figure 1, and Supplemental Figures 4 and 5. We also added a topology diagram (Supplemental Figure 3).

      (2) Figure 2A and B: Please quantify the interaction here with a proper statistical comparison.

      In the experiments of Figures 2A and 2B, we used a co-expression system to co-purify the complexes and check their binding. For quantifying, we added the concentrations of the samples used in the Method of [Mutant analysis of Sld3 and Cdc45].

      (3) Figure 3B, EMSA: If these are from the EMSA assay, at least free DNAs and protein-bound DNAs are present on the gel. However, the authors showed one band, which seems to be free DNA in Figure 3B and separately the smear band of the protein complex in Supplementary Figure 12, and judged the DNA binding by the disappearance of the band (line 207). Interestingly, in the case of Sld3CBD, there are few smear bands (Supplementary Figure 12). Where is DNA in this case? The disappearance could be due to the contaminated nucleases (need a control non-specific DNA). Without showing the Sld3CBD-DNA complex in the gel, the conclusion that the DNA binding activity of Sld3CBD-Cdc45 to DNA is lower than Sld3CBD alone (line 210) is very much speculative. The same is true for Sld7-Sld3dC-Cdc45.

      Please explain the method (EMSA) briefly in the main text and show a whole gel in both Figures. If the authors insist that the Sld3 DNA-binding activity is altered with Cdc43 (and MCM), it is better to perform a more quantitative DNA binding assay such as BIAcore (surface plasmon), etc.

      In the EMSA, we use SYBR (Figure 4B) and CBB (Supplementary Figure 13) staining to show bands of ssDNA and protein, respectively. As the reviewer mentioned, the disappearance of the bands could be due to the contaminated nucleases, we did experiments with non-specific ssDNA-binding as a control using the same proteins shown in Supplementary Figure 14. So, we are convinced that the disappearance of the ssDNA bands or not disappearance could occur when binding to protein or not. We added such explanations in the text (P9/L242-244). As we mentioned in the legend of Supplementary Figure 13, the Sld3CBD could not enter the gel, even when bound to ssDNA, because the pI values exceeded the pH of the running buffer.

      Following the reviewer's comments, we attempted a pull-down experiment using Histag (C-terminal histag of Sld3CBD/Sld3ΔC). Unfortunately, we encountered difficulties in achieving the balance between binding and chromatography conditions.

      (4) Figure 3B: Please quantify the DNA binding here with a proper statistical comparison with triplicate.

      For EMSA (Figure 3B), we used samples of ssDNA:protein= 1:0. 1:1, 1:2, 1:4 and 0:1 molecular ratios with 10 pM as a 1 unit. We added concentrations of the samples in the Method of [Electrophoretic mobility shift assay for ssDNA binding].

      Following the comment, we tried to quantify the binding strength by integrating the grayscale of the bands in gel photos. However, we are concerned because this quantitative calculation through grayscale could not provide an accurate representation of results. Many sample groups cannot be run on one gel. Therefore, the gel differences in parameters cause large errors in the calculation as shown in Author response image 1. Although the calculated integral grayscale chart is consistent with our conclusion, we do not want to add this to our manuscript.

      Author response image 1.

      (5) Because of poor writing, the authors need to ask for English editing.

      We are very sorry for the language. We asked a company (Editag, https:www.editage.jp) to do a native speaker revision and used AI to recheck English.

      Minor points:

      (1) Lines 47-58, Supplementary Figure 1: Although the sentences describe well how CMG assembles on the replication origin, the figure does not reflect what is written, but rather shows a simple schematic figure related to the work. However, for the general readers, it is very useful to see a general model of the CMG assembly. Then, the authors need to emphasize the steps focused in this study.

      Thank you for your thoughtful comments. We optimized Figure 1 and hope it will be more understandable to general readers.

      (2) Line 50, DDK[6F0L](superscript): what is 5F0L?

      We are sorry for this mistake, that is a PDBID of the DDK structure. we deleted 6F0L.

      (3) Lines 68 and 69, ssDNA and dsDNA: should be "single-stranded DNA (ssDNA)" and double-stranded DNA (dsDNA) when these words appear for the first time.

      Following the comment, we modified it to “single-stranded DNA (ssDNA)” and “double-stranded DNA (dsDNA)” (P3/L68,70).

      (4) Line 84, Cdc45s: What "s" means here?

      We are sorry for this mistake, we modified it to “Cdc45”.

      (5) Line 87, Sld3deltaC: What is Sld3deltaC? This is the deletion of either the Cdc45-binding domain or the C-terminal domain.

      Sld3ΔC is a deletion of the C-terminal domain of Sld3. We added the residue range and explanation (P4/L91).

      (6) Line 103: Although the authors mentioned beta-sheets 1-14 in the text, there is no indication in Figures. It is impossible to see the authors' conclusion.

      The secondary structure elements of Sld3CBD-Cdc45 are shown in Supplementary Figures 4 and 5. Following the comment, we added a topology diagram of Sld3CBD and Cdc45 in the Sld3CBD-Cdc45 complex as Supplementary Figure 3 and added citations when describing structural elements.

      (7) Line 106, huCdc45: Does this mean human Cdc45? If so, it should be "human CDC45 (huCDC45). CMG form is from budding yeast? Please specify the species.

      Yes, huCdc45 is human Cdc45. We modified it into “human CDC45 (huCdc45)”.

      (8) Line 107, Supplemental Figure 3B, black ovals: Please add "alpha7" in the Figure.

      Following the comment, we added a label of Cdc45 α7 to Supplemental Figure 3B and 3C (Supplemental Figure 4B and 4C in revised version).

      (9) Line 128, DHHA1: What is this? Please explain it in the text.

      Following the comment, we added the information on DHHA1 (P3/L75-77).

      (10) Line 130, beta13, and beta14: If the authors would like to point out these structures, please indicate where these sheets are in Figures.

      We added a topology diagram as Supplementary Figure 3 to show the β-sheet in DHH and added a citation in the text.

      (11) Line 133: Please add (Figure 1B) after the a8CTP.

      Following the comment, we added “(Figure 1C)” (1B is 1C in revised version) after the α8CTP (P6/L133).

      (12) Line 140: After DHHA1, please add (Figure 1C).

      Following the comment, we added the figure citation after the DHHA1 (P6/L140).

      (13) Line 142: After DHHA1, please add (Figure 1D).

      Following the comment, we added the figure citation after the DHHA1 (P6/L142).

      (14) Line 149, Sld3-Y seemed to retain a faint interaction with Cdc45. The Cdc45 band is too faint here. Moreover, as shown above, without the quantification with proper statistics, it is hard to draw this kind of conclusion.

      We agree that the Cdc45 band corresponding to Sld3-Y in the pull-down assay was very faint, so we performed an in vivo experiment (Fig2C) to confirm this result.

      (15) Line 149, Figure 2A and B: What kind of interaction assay was used here? Simple pull-down. It seems to eluate from the column. If so, how do the authors evaluate the presence of the proteins in different fractions? Please explain the method briefly in the main text.

      Figure 2 shows a co-express pull-down binding assay. To describe the co-express pull-down experiments clearly, we added more explanations in the Methods [Mutation analysis of Sld3 and Cdc45].

      (16) Line 154-155: Please show the quantification to see if the reduced binding is statistically significant.

      Here, we explain why Cdc45-A remained Sld3CBD-bind ability. Although mutant Cdc45-A has reduced three hydrogen bonds with D344 of Sld3CBD, the remaining hydrogen-bond network keeps contact between Sld3CBD and Cdc45.

      (17) Line 158, cell death: "No growth" does not mean cell death. Please rephrase here.

      Following the comment, we modified it to “no growth” (P6/L158).

      (18) Line 166: After CMG dimer, please add "respectively".

      Following the comment, we added the word “, respectively” after CMG dimer (P7/L178).

      (19) Line 194-195: I can not catch the meaning. Please rephrase here to clarify the claim. What are ssARS1-2 and ARS1-5?

      Following the comment, we added more information about ssDNA fragments at the beginning of this section (P8/L210-214).

      (20) Figure 4A and Supplemental Figure 12 top, schematic figure of ARS region. It is hard to catch. More explanation of the nature of the DNA substrates and much better schematic presentations would be appreciated.

      Following the comment, we added more information about ARS1 to the figure legend.

      (21) Figure 1A, dotted ovals should be dotted squares as shown in the enlarged images on the bottom.

      Following the comment, we modified Figure 1A and the legend to change the dotted ovals into dotted squares.

    1. eLife Assessment

      This useful study provides incomplete evidence that TANGO2 homologs, including HRG-9 and HRG-10, are not heme chaperones but play a role in cellular bioenergetics and oxidative stress homeostasis. While outstanding strengths include the use of different model systems, genetic tools, and behavioral assays, there are weaknesses in the data presented for the conclusions drawn. Due to the differences in experimental protocols between this study and the previous work reported by Sun et al., it is insufficient to rule out the role of TANGO2 as a heme chaperone, and furthermore, the authors provide only indirect evidence for the role of TANGO2 in bioenergetic and oxidative stress pathways. Nevertheless, this study paves the way for future mechanistic studies addressing the mechanisms of how TANGO2 regulates oxidative stress independent of its previously demonstrated role as a heme chaperone.

    2. Reviewer #1 (Public review):

      Summary:

      Sandkuhler et al. re-evaluated the biological functions of TANGO2 homologs in C. elegans, yeast, and zebrafish. Compared to the previously reported role of TANGO2 homologs in transporting heme, Sandkuhler et al. expressed a different opinion on the biological functions of TANGO2 homologs. With the support of some results from their tests, they conclude that 'there is insufficient evidence to support heme transport as the primary function of TANGO2', in addition to their claims on the role of TANGO2 in modulating metabolism. While the differences are reported in this study, more work is needed to elucidate the biological function of TANGO2.

      Strengths:

      (1) This work revisited a set of key experiments, including the toxic heme analog GaPP survival assay, the fluorescent ZnMP accumulation assay, and the multi-organismal investigations documented by Sun et al. in Nature 2022, which is critical for comparing the two works.

      (2) This work reported additional phenotypes for the C. elegans mutant of the TANGO2 homologs, including lawn avoidance, reduced pharyngeal pumping, smaller brood size, faster exhaustion under swimming test, and a shorter lifespan. These phenotypes are important for understanding the biological function of TANGO2 homologs, while they were missing from the report by Sun et al.

      (3) Investigating the 'reduced GaPP consumption' as a cause of increased resistance against the toxic GaPP for the TANGO2 homologs, hrg-9 hrg-10 double null mutant provides a valuable perspective for studying the biological function of TANGO2 homologs.

      (4) This work thoroughly evaluated the role of TANGO2 homologs in supporting yeast growth using multiple yeast strains and also pointed out the mitochondrial genome instability feature of the yeast strain used by Sun et al.

      Weaknesses:

      (1) A detailed comparison between this work and the work of Sun et al. on experimental protocols and reagents in the main text will be beneficial for readers to assess critically.

      (2) The GaPP used by Sun et al. (purchased from Frontier Scientific) is more effective in killing the worm than the one used in this study (purchased from Santa Cruz). Is the different outcome due to the differences in reagents? Moreover, Sun et al. examined the lethality after 3-4 days, while this work examined the lethality after 72 hours. Would the extra 24 hours make any difference in the result?

      (3) This work reported the opposite result of Sun et al. for the fluorescent ZnMP accumulation assay. However, the experimental protocols used by the two studies are massively different. Sun et al. did the ZnMP staining by incubating the L4-stage worms in an axenic mCeHR2 medium containing 40 μM ZnMP (purchased from Frontier Scientific) and 4 μM heme at 20 ℃ for 16 h, while this work placed the L4-stage worms on the OP50 E. coli seeded NGM plates treated with 40 μM ZnMP (purchased from Santa Cruz) for 16 h. The liquid axenic mCeHR2 medium is bacteria-free, heme-free, and consistent for ZnMP uptake by worms. This work has mentioned that the hrg-9 hrg-10 double null mutant has bacterial lawn avoidance and reduced pharyngeal pumping phenotypes. Therefore, the ZnMP staining protocol used in this work faces challenges in the environmental control for the wild type vs. the mutant. The authors should adopt the ZnMP staining protocol used by Sun et al. for a proper evaluation of fluorescent ZnMP accumulation.

      (4) A striking difference between the two studies is that Sun et al. emphasize the biochemical function of TANGO2 homologs in heme transporting with evidence from some biochemical tests. In contrast, this work emphasizes the physiological function of TANGO2 homologs with evidence from multiple phenotypical observations. In the discussion part, the authors should address whether these observed phenotypes in this study can be due to the loss of heme transporting activities upon eliminating TANGO2 homologs. This action can improve the merit of academic debate and collaboration.

    3. Reviewer #2 (Public review):

      Summary:

      This work investigates the roles of TANGO2 orthologs in different model systems and suggests bioenergetic dysfunction and oxidative stress (and not heme metabolism) as crucial pathways in TANGO2 deficiency disorders (TDD). Specifically, studies in C. elegans showed that the lack of TANGO2 ortholog activity (i) does not provide a survival benefit upon toxic heme exposure; (ii) results in a series of defects related to energy levels (reduced pharyngeal pumping, lawn avoidance, poor motility, and low brood size); (iii) reduces the fluorescence of the heme analog ZnMP in the intestine. Furthermore, upon oxidative stress, one TANGO2 ortholog, hrg-9, is upregulated compared to control conditions. Additional studies on yeast and zebrafish models failed to replicate prior findings on heme distribution and muscle integrity.

      These findings have a clear therapeutic impact, as TDD currently has no cure but only symptom-managing treatments. Identifying the correct pathway to correct the disease is pivotal to finding a cure.

      Although compelling, the authors' primary claim is based on indirect evidence that only hints toward it. Unfortunately, I do not see any direct and convincing evidence linking TANGO2 orthologs to bioenergetic and oxidative stress pathways.

      Strengths:

      (1) The study refutes and extends previous findings, highlighting new aspects of TANGO2's roles in cell physiology.

      (2) The use of different model systems to address the main research questions is useful.

      (3) The results suggest a broader impact than previously described, somewhat supporting the novelty of the study.

      Weaknesses:

      (1) The manuscript is written mainly as a criticism of a previously published paper. Although reproducibility in science is an issue that needs to be acknowledged, a manuscript should focus on the new data and the experiments that can better prove and strengthen the new claims.

      (2) The current presentation of the logic of the study and its results does not help the authors deliver their message, although they possess great potential.

      (3) The study is missing experiments to link hrg-9 and hrg-10 more directly to bioenergetic and oxidative stress pathways.

    4. Reviewer #3 (Public review):

      In this paper, Sandkuhler et al. reassessed the role of TANGO2 as a heme chaperone proposed by Sun et al in a recently published paper (https://doi.org/10.1038/s41586-022-05347-z) by partially repeating and failing to replicate experiments therein. Overall, Sandkuhler et al. conclude that the heme-related roles of TANGO2 had been overemphasized by Sun et al. especially because the hrg9 gene does not exclusively respond to different regimens of heme synthesis/uptake but is susceptible to a greater extent to, for example, oxidative stress.

      In recent years, the discussion around the heme-related roles of TANGO2 has been tantalizing but is still far from a definitive consensus. Discrepancies between results and their interpretation are a testament to how challenging and ambitious the understanding of TANGO2 and the phenotypes associated with TANGO2 defects are. Overall, the work presented by Sandkuhler et al. in this manuscript challenges the recent developments in the field and promotes the continuous characterisation of TANGO2 in relation to heme homeostasis.

      A few comments and questions:

      (1) The authors stress - with evidence provided in this paper or indicated in the literature - that the primary role of TANGO2 and its homologues is unlikely to be related to heme trafficking, arguing that observed effects on heme transport are instead downstream consequences of aberrant cellular metabolism. But in light of a mounting body of evidence (referenced by the authors) connecting more or less directly TANGO2 to heme trafficking and mobilization, it is recommended that the authors comment on how they think TANGO2 could relate to and be essential for heme trafficking, albeit in a secondary, moonlighting capacity. This would highlight a seemingly common theme in emerging key players in intracellular heme trafficking, as it appears to be the case for GAPDH - with accumulating evidence of this glycolytic enzyme being critical for heme delivery to several downstream proteins.

      (2) The observation - using eat-2 mutants and lawn avoidance behaviour - that survival patterns can be partially explained by reduced consumption, is fascinating. It would be interesting to quantify the two relative contributions.

      (3) In the legend to Figure 1A it's a bit unclear what the differently coloured dots represent for each condition. Repeated measurements, worms, independent experiments? The authors should clarify this.

      (4) It would help if the entire fluorescence images (raw and processed) for the ZnMP treatments were provided. Fluorescence images would also benefit Figure 1B.

      (5) Increasingly, the understanding of heme-dependent roles relies on transient or indirect binding to unsuspected partners, not necessarily relying on a tight affinity and outdating the notion of heme as a static cofactor. Despite impressive recent advancements in the detection of these interactions (for example https://doi.org/10.1021/jacs.2c06104; cited by the authors), a full characterisation of the hemome is still elusive. Sandkuhler et al. deemed it possible but seem to question that heme binding to TANGO2 occurs. However, Sun et al. convincingly showed and characterised TANGO2 binding to heme. It is recommended that the authors comment on this.

    5. Author response:

      We have reviewed the helpful feedback from the reviewers and would like to thank them for their careful consideration of our manuscript. By way of provisional response, we agree with many of the above points and plan to revise our manuscript accordingly.

      In an effort to replicate some of the heme trafficking-related experiments in the original paper using a C. elegans model of TDD, we were either unable to do so or demonstrated an alternative explanation for the findings we could partially reproduce. As the reviewers correctly point out, there were some methodological and reagent-related differences between the study by Sun et al. and our own that we will more directly highlight in a subsequent manuscript version. Additionally, where possible, we will attempt to replicate these experiments using the same protocol(s).

      We observed several phenotypic traits observed in the C. elegans model of TDD that were not previously described in prior studies. While we believe these features to be consistent with a bioenergetic problem in the worm, direct evidence for this is admittedly lacking in our original manuscript. We are actively engaged in experiments examining potential functions of HRG-9 and HRG-10 unrelated to heme trafficking and will consider which data best aligns with the scope of this study, thus warranting inclusion in a subsequent manuscript version. We will also provide a more comprehensive review of relevant data generated by other groups (e.g., lipid dysregulation, impaired autophagy, mitochondrial dysfunction in the absence of TANGO2) in the discussion section.

      Recommended improvements related to figure legends, terminology, and formatting will also be executed in our forthcoming version. On behalf of my co-authors and myself, thank you again for your time and effort improving this work.

    1. eLife Assessment

      This study provides valuable findings on the role of site-specific DNA methylation changes during spermatogenesis and their contribution to paternal epigenetic inheritance. The study proposes that selective loss of DNA methylation at a subset of promoters is required for nucleosome retention and the establishment of epigenetic states that may influence embryonic gene regulation. The present study's conclusion is mostly supported by solid data.

    2. Reviewer #1 (Public review):

      This study investigates the role of site-specific DNA methylation changes during spermatogenesis and their contribution to paternal epigenetic inheritance. Using MethylCap-seq, the authors identify a transient, site-specific loss of DNA methylation at transcription start sites (TSSs) of late spermatogenesis genes during the transition from differentiating spermatogonia (KIT+) to pachytene spermatocytes (PS). This demethylation event correlates with the gain of H3K4me3, which presets nucleosome retention sites in mouse sperm. The study proposes that selective loss of DNA methylation at a subset of promoters is required for nucleosome retention and the establishment of epigenetic states that may influence embryonic gene regulation. These findings provide complementary insights to earlier work by the Peters lab, "DNA methylation modulates nucleosome retention in sperm and H3K4 methylation deposition in early mouse embryos."

      Overall, the study presents a valuable dataset; however, additional analyses could strengthen the conclusions and provide further mechanistic insights.

      Major Comments:

      (1) Prior work should be acknowledged and used for comparative analysis. A key proposal in this study is that regions undergoing DNA methylation loss retain histones, influencing the zygote's epigenetic landscape. However, previous studies (e.g., Peters et al.) have shown that regions losing methylation in DNMT3a/b knockout (KO) mice do not necessarily retain histones, suggesting additional factors are involved. Moreover, Peters et al. demonstrated that regions of low DNA methylation in sperm render paternal alleles permissive for H3K4me3 establishment in early embryos, independent of the paternal inheritance of sperm-borne H3K4me3. Comparing these findings would refine the model presented in this study.

      (2) Figure 2A: The data suggest an increase in methylation peaks in PS cells. How does this align with the hypomethylation observed in Figure 1D? Reconciling these observations would improve clarity.

      (3) Figure 4A: The effect size of demethylation on nucleosome retention is unclear - do all demethylated promoters retain histones or only a subset? Quantifying this would clarify whether DNA methylation loss consistently predicts nucleosome retention.

      (4) Prior studies have generated bisulfite sequencing data from Tet KO sperm. Do the regions that undergo demethylation during the KIT+ to PS transition overlap with those misregulated in TET KO sperm? Integrating this comparison could provide further insight into the regulation of site-specific demethylation.

      (5) The role of SCML2 enrichment in germline stem cells and its connection to H3K27me3 deposition in later germ cells is unclear. Earlier figures show that regions undergoing DNA demethylation from KIT+ to PS include genes expressed in later-stage germ cells.

      Why is SCML2 enrichment occurring in germline stem cells (GSCs)? Why is H3K27me3 only acquired at later stages if SCML2 is already present? Is SCML2 preventing premature expression independent of K27ME?

      Showing the dynamics of H3K27me3 and SCML2 across these stages would clarify the proposed conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This study profiles the genome-wide distribution of DNA methylation using methylation capture sequencing in four stages of male germ cells: Thy1+ (undifferentiated spermatogonia), Kit+ (differentiated spermatogonia), pachytene spermatocytes, and round spermatids. These analyses revealed site-specific loss of DNA methylation in pachytene cells compared with differentiating spermatogonia. Integrated analysis using published datasets indicates that hypomethylated sites correlate with nucleosome retention sites and bivalent histone methylation sites in sperm.

      Strengths:

      The methyl-seq approach provides a comprehensive profile of DNA methylation in male germ cells. The concept that DNA hypomethylation in meiotic cells precedes histone modification and histone retention in sperm is interesting.

      Weaknesses:

      (1) In the title, the word "presets" should be changed to "precedes" or "correlates with". Preset means a causal relationship, which is not the case. This needs to be changed throughout the manuscript. For example, in the abstract, "predetermine" needs to be changed to "precede".

      (2) The statement that "Based on these results, we propose that meiosis is a process of epigenetic reprogramming that sets up embryonic gene regulation" (lines 94-95) is a speculation that in the opinion of this reviewer should be removed from the text. It is too broad and not supported by the data presented.

      (3) Figure 1B: details are missing. How many cells were analyzed/used? How many times was this experiment done [(The number of experiments (n)]? Were the changes statistically significant (Lines 109-111)?

      (4) Figure 1A and Figure 1D: These seem to be contradictory. According to Figure 1D, leptotene/zygotene spermatocytes show bright 5mC staining. However, the diagram in 1A shows delayed recovery of DNA methylation. The authors should clarify this. It appears that 5mC was high in Kit+ spermatogonia and leptotene/zygotene spermatocytes, and then decreased in pachytene spermatocytes.

      (5) L121-122: Statement: These results suggest that 5mC levels change dynamically during spermatogenesis before and after the transient reduction of DNA methylation in the premeiotic S phase. In order to make this claim about the premeiotic S phase, I suggest performing 5mC staining in premeiotic S phase cells, which can be pulse-labelled with BrdU or cite a reference if available.

    1. eLife Assessment

      This important study provides solid evidence for new insights into the role of Type-1 nNOS interneurons in driving neuronal network activity and controlling vascular network dynamics in awake, head-fixed mice. The authors use an original strategy based on the ablation of Type-1 nNOS interneurons with local injection of saporin conjugated to a substance P analogue into the somatosensory cortex. They show that ablation of type I nNOS neurons has surprisingly little effect on neurovascular coupling, although it alters neural activity and vascular dynamics.

    2. Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic, or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes, and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging, and wide-field optical imaging, the authors observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods, and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses, and these can be easily addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This important study by Turner et al. examines the functional role of a sparse but unique population of neurons in the cortex that express Nitric oxide synthase (Nos1). To do this, they pharmacologically ablate these neurons in the focal region of whisker-related primary somatosensory (S1) cortex using a saponin-substance P conjugate. Using widefield and 2-photon microscopy, as well as field recordings, they examine the impact of this cell-specific lesion on blood flow dynamics and neuronal population activity. Locally within the S1 cortex, they find changes in neural activity patterns, decreased delta band power, and reduced sensory-evoked changes in blood flow (specifically eliminating the sustained blood flow change after stimulation). Surprisingly, given the tiny fraction of cortical neurons removed by the lesion, they also find far-reaching effects on neural activity patterns and blood volume oscillations between the cerebral hemispheres.

      Strengths:

      This was a technically challenging study and the experiments were executed in an expert manner. The manuscript was well written and I appreciated the cartoon summary diagrams included in each figure. The analysis was rigorous and appropriate. Their discovery that Nos1 neurons can have far-reaching effects on blood flow dynamics and neural activity is quite novel and surprising (to me at least) and should seed many follow-up, mechanistic experiments to explain this phenomenon. The conclusions were justified by the convincing data presented.

      Weaknesses:

      I did not find any major flaws in the study. I have noted some potential issues with the authors' characterization of the lesion and its extent. The authors may want to re-analyse some of their data to further strengthen their conclusions. Lastly, some methodological information was missing, which should be addressed.

    4. Reviewer #3 (Public review):

      The role of type-I nNOS neurons is not fully understood. The data presented in this paper addresses this gap through optical and electrophysiological recordings in adult mice (awake and asleep).

      This manuscript reports on a study on type-I nNOS neurons in the somatosensory cortex of adult mice, from 3 to 9 months of age. Most data were acquired using a combination of IOS and electrophysiological recordings in awake and asleep mice. Pharmacological ablation of the type-I nNOS populations of cells led to decreased coherence in gamma band coupling between left and right hemispheres; decreased ultra-low frequency coupling between blood volume in each hemisphere; decreased (superficial) vascular responses to sustained sensory stimulus and abolishment of the post-stimulus CBV undershoot. While the findings shed new light on the role of type-I nNOS neurons, the etiology of the discrepancies between current observations and literature observations is not clear and many potential explanations are put forth in the discussion.

    1. eLife Assessment

      This useful study presents computational analyses of over 5,000 predicted extant and ancestral nitrogenase structures. While the data and some analyses are solid, the study remains incomplete in demonstrating that the metrics used for comparing nitrogenase structures are statistically rigorous. The data generated in this study provide a vast resource that can serve as a starting point for functional studies of reconstructed and extant nitrogenases.

    2. Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data. In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph.

      This work provides a useful resource for studying nitrogenase evolution. However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

    3. Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenanses, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability.

      The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others).

      It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

    1. eLife Assessment

      This valuable study uses C. elegans to provide new insights into the role of the conserved protein FLWR-1/Flower in synaptic transmission. Employing a variety of techniques, including calcium imaging, ultrastructural analysis, and electrophysiology, the paper provides evidence that challenges some previous thinking about FLWR-1 function. While most of the findings are convincing, some of the authors' conclusions about the mechanisms of FLWR-1 function remain somewhat speculative.

    2. Reviewer #1 (Public review):

      Public Review

      The authors investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions. They observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype of flwr-1 mutants to wild-type levels. By contrast, cholinergic neuron expression did not rescue aldicarb sensitivity at all. They also showed that FLWR-1 removal leads to increased Ca2+ signaling in motor neurons upon photo-stimulation. From these findings, the authors conclude that FLWR-1 helps maintain the balance between excitation and inhibition (E/I) by preferentially regulating GABAergic neuronal excitability in a cell-autonomous manner.

      Overall, the work presents solid data and interesting findings, however the proposed cell-autonomous model of GABAergic FLWR-1 function may be overly simplified in my opinion.

      Most of my previous comments have been addressed; however, two issues remain.

      (1) I appreciate the authors' efforts conducting additional aldicarb sensitivity assays that combine muscle-specific rescue with either cholinergic or GABergic neuron-specific expression of FLWR-1. In the revised manuscript, they conclude, "This did not show any additive effects to the pure neuronal rescues, thus FLWR-1 effects on muscle cell responses to cholinergic agonists must be cell-autonomous." However, I find this interpretation confusing for the reasons outlined below.

      Figure 1 - Figure Supplement 3B shows that muscle-specific FLWR-1 expression in flwr-1 mutants significantly restores aldicarb sensitivity. However, when FLWR-1 is co-expressed in both cholinergic neurons and muscle, the worms behave like flwr-1 mutants and no rescue is observed. Similarly, cholinergic FLWR-1 alone fails to restore aldicarb sensitivity (shown in the previous manuscript). These observations indicate a non-cell-autonomous interaction between cholinergic neurons and muscle, rather than a strictly muscle cell-autonomous mechanism. In other words, FLWR-1 expressed in cholinergic neurons appears to negate or block the rescue effect of muscle-expressed FLWR-1. Therefore, FLWR-1 could play a more complex role in coordinating physiology across different tissues. This complexity may affect interpretations of Ca2+ dynamics and/or functional data, particularly in relation to E/I balance, and thus warrants careful discussion or further investigation.

      (2) The revised manuscript includes new GCaMP analyses restricted to synaptic puncta. The authors mention that "we compared Ca2+ signals in synaptic puncta versus axon shafts, and did not find any differences," concluding that "FLWR-1's impact is local, in synaptic boutons." This is puzzling: the similarity of Ca2+ signals in synaptic regions and axon shafts seems to indicate a more global effect on Ca2+ dynamics or may simply reflect limited temporal resolution in distinguishing local from global signals due to rapid Ca2+ diffusion. The authors should clarify how they reached the conclusion that FLWR-1 has a localized impact at synaptic boutons, given that synaptic and axonal signals appear similar. Based on the presented data, the evidence supporting a local effect of FLWR-1 on Ca2+ dynamics appears limited.

    3. Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca2+ channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca2+ dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca2+ levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca2+-ATPase and PIP2 binding in FLWR-1's function.

      The authors have adequately addressed most of my previous concerns, however, I recommend minor revisions to further strengthen the study's rigor and interpretation:

      Major suggestions

      (1) This study relies heavily on aldicarb assays to support its conclusions. While these assays are valuable, their results may not fully align with direct assessment of neurotransmitter release from motor neurons. For instance, prior work has shown that two presynaptic modulators identified through aldicarb sensitivity assays exhibited no corresponding electrophysiological defects at the neuromuscular junction (Liu et al., J Neurosci 27: 10404-10413, 2007). Similarly, at least one study from the Kaplan lab has noted discrepancies between aldicarb assays and electrophysiological analyses. The authors should consider adding a few sentences in the Discussion to acknowledge this limitation and the potential caveats of using aldicarb assays, especially since some of the aldicarb assay results in this study are not easily interpretable.

      (2) The manuscript states, "Elevated Ca2+ levels were not further enhanced in a flwr-1;mca-3 double mutant." (lines 549-550). However, Figure 7C does not include statistical comparisons between the single and double mutants of flwr-1 and mca-3. Please add the necessary statistical analysis to support this statement.

      (3) The term "Ca2+ influx" should be avoided, as this study does not provide direct evidence (e.g. voltage-clamp recordings of Ca2+ inward currents in motor neurons) for an effect of the flwr-1 mutation of Ca2+ influx. The observed increase in neuronal GCaMP signals in response to optogenetic activation of ChR2 may result from, or be influenced by, Ca2+ mobilization from of intracellular stores. For example, optogenetic stimulation could trigger ryanodine receptor-mediated Ca2+ release from the ER via calcium-induced calcium release (CICR) or depolarization-induced calcium release (DICR). It would be more appropriate to describe the observed increase in Ca2+ signal as "Ca2+ elevation" rather than increased "Ca2+ influx".

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. eLife Assessment

      This important study uses Mendelian Randomisation to show that early life phenotypes (i.e. onset of age at menarche and age at first birth) have an influence on a multitude of health outcomes later in life. The provided empirical evidence supporting the antagonistic pleiotropy theory is solid. However, some results seem improbable and need to be checked to make sure they are correct.

    2. Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive succes is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses

      Weaknesses:

      Still a number of doubts with regard to some of the results and their interpretation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth may have a positive effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identify 128 fertility-related SNPs that associate with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      The authors addressed the remarks on the previous version very well. Addressing the two points below would further increase the quality of the manuscript.

      (1) In the previous version the authors mentioned that their results are also consistent with the disposable soma theory: "These results are also consistent with the disposable soma theory that suggests aging as an outcome tradeoff between an organism's investment in reproduction and somatic maintenance and repair."

      Although the antagonistic pleiotropy and disposable soma theories describe different mechanisms, both provide frameworks for understanding how genes linked to fertility influence health. The antagonistic pleiotropy theory posits that genes enhancing fertility early in life may have detrimental effects later. In contrast, the disposable soma theory suggests that energy allocation involves a trade-off, where investment in fertility comes at the expense of somatic maintenance, potentially leading to poorer health in later life.

      To strengthen the manuscript, a discussion section should be added to clarify the overlap and distinctions between these two evolutionary theories and suggest directions for future research in disentangling their specific mechanisms.

      (2) In response to the question why the authors did not include age at menopause in addition to the already included age at first child and age at menarche the following explanation was provided: "Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research."

      It remains, however, unclear why genes beneficial for early survival and reproduction would be reflected only in age at menarche and age at first childbirth, but not in age at menopause. While age at menarche marks the onset of fertility, age at menopause signifies its end. Since evolutionary selection acts directly until reproduction is no longer possible (though indirect evolutionary pressures persist beyond this point), the inclusion of additional fertility-related measures could have strengthened the analysis. A more detailed justification for focusing exclusively on age at menarche and first childbirth would enhance the clarity and rigor of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      Thank you for your observation. We have revised the manuscript to ensure that the results for continuous outcomes are appropriately reported using beta coefficients, which indicate the change in the outcome per unit increase in exposure. This will accurately reflect the nature of the analysis and provide a clearer interpretation of continuous outcomes (lines 56-109).

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      Thank you for your detailed feedback. We have reviewed all the GWAS data used in our MR analyses and confirmed that all GWAS of continuous traits have already been processed using the BOLT-LMM, including age at menarche, age at first birth, BMI, frailty index, father's age at death, mother's age at death, DNA methylation GrimAge acceleration, age at menopause, eye age, and facial aging. Most of the dichotomous outcomes have not been processed by BOLT-LMM, including late-onset Alzheimer's disease, type 2 diabetes, chronic heart failure, essential hypertension, cirrhosis, chronic kidney disease, early onset chronic obstructive pulmonary disease, breast cancer, ovarian cancer, endometrial cancer, and cervical cancer, except osteoporosis. We have reprocessed the GWAS beta values of osteoporosis and re-conducted the MR analysis (lines 74-75; lines 366-373).

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      Thank you for your suggestion to follow the MR-STROBE guidelines for the presentation of our study. We appreciate the importance of adhering to these standardized guidelines to ensure clarity and transparency in reporting Mendelian Randomization (MR) analyses. We confirm that the MR components of our research are structured and presented following the MR-STROBE checklist. In addition to the MR analyses, our study also integrates Colocalization analysis, Genetic correlation analysis, Ingenuity Pathway Analysis (IPA), and population validation to provide a more comprehensive understanding of the genetic and biological context. While these analyses are not strictly covered by MR-STROBE guidelines, they complement the MR results by offering additional validation and mechanistic insights.

      We have structured our manuscript to separate these complementary analyses from the core MR results, maintaining alignment with MR-STROBE for the MR-specific components. The additional analyses are discussed in dedicated sections to highlight their unique contributions and avoid conflating them with the MR findings.

      (4) The authors should report data in the text with a 95% confidence interval.

      Thank you for your feedback. We have added the 95% confidence intervals for the reported data within the main text to enhance clarity and provide comprehensive context (lines 56-109). Additionally, the complete analysis data, including all detailed results, can be found in Table S3.

      (5) The authors should consider correction for multiple testing

      Thank you for your comment regarding the need to consider correction for multiple testing. We agree that correcting for multiple comparisons is an important step to control for the possibility of false-positive findings, particularly in studies involving large numbers of statistical tests. In our study, we carefully considered the issue of multiple testing and adopted the following approach:

      Context of Multiple Testing: The tests we conducted were hypothesis-driven, focusing on specific relationships (e.g., genetic correlation, colocalization, and Mendelian Randomization). These analyses are based on priori hypotheses supported by existing literature or biological relevance.

      Statistical Methods: Where applicable, we applied appropriate measures to account for multiple tests. For instance, in Mendelian Randomization, sensitivity analyses serve to validate the robustness of the results.

      We believe that the methodology and corrections applied in our study appropriately address concerns about multiple testing, given the hypothesis-driven nature of our analyses and the rigorous steps taken to validate our findings. If you feel that additional corrections are required for specific parts of the analysis, we would be happy to further clarify or revise as needed.

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans?

      Thank you for these insightful questions. We appreciate the opportunity to clarify how we approach the testing of AP theory within a contemporary human cohort and address the evolutionary context and comparative considerations with the disposable soma theory.

      We recognize that modern human populations experience selection pressures that differ from those in the past, which may affect how well certain genetic variants reflect historical fitness benefits. Nonetheless, the genetic variation present today still offers valuable insights into potential AP mechanisms through statistical associations in contemporary cohorts. We believe that AP can indeed be explored in current populations by examining genetic links between reproductive traits and age-related health outcomes. In our study, we investigate whether certain genetic variants linked to reproductive timing—such as age at menarche and age at first birth—also correlate with late-life health risks. By identifying SNPs associated with both early-life reproductive success and adverse aging outcomes, we aim to capture the evolutionary trade-offs that AP theory suggests.

      Despite contemporary selection pressures that differ from historical conditions, there remains natural genetic variation in traits like reproductive timing and longevity in humans today. This diversity allows us to apply MR to test causal relationships between reproductive traits and aging outcomes, providing insights into potential AP mechanisms. Prior studies have demonstrated that reproductive behaviors exhibit significant heritability and have identified genetic loci associated with reproductive timing (1,2). This genetic variation facilitates causal inference in modern cohorts, despite environmental and healthcare advances that might modulate these associations (3). By leveraging genetic risk scores for reproductive timing, our study captures the necessary variability to assess potential AP effects, thus providing valuable insights into how evolutionary trade-offs may continue to influence human health outcomes.

      How do genetic risk score distributions of the exposure data look like?

      Thank you for your question. Our study is focused on Mendelian Randomization (MR) analysis, which aims to infer causal relationships between exposures and outcomes. While genetic risk scores (GRS) provide valuable insights at an individual level, they do not directly align with our study's objective, which is centered on population-level causal inference rather than individual-level genetic risk assessment. In MR, we use genetic variants as instrumental variables to determine the causal effect of an exposure on an outcome. GRS analysis typically focuses on summarizing an individual's risk based on multiple genetic variants, which is outside the scope of our current research. Therefore, we did not perform or analyze the distribution of genetic risk scores, as our primary goal was to understand broader causal relationships using established genetic instruments.

      Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      In our manuscript, we test AP theory specifically by examining genetic variants associated with reproductive timing and their association with age-related health risks in later life. MR and genetic risk scores allow us to assess these associations, directly testing the hypothesis that certain alleles enhancing reproductive success might have adverse effects on aging outcomes. This gene-centered approach aligns with AP’s premise of genetic trade-offs, enabling us to observe whether alleles associated with early-life reproductive traits correlate with increased risks of age-related diseases. Distinguishing from disposable soma theory, which would predict a general trade-off in energy allocation affecting somatic maintenance and not specific genetic effects, our data focuses on how certain alleles have differential impacts across life stages. Our findings thus support AP theory over disposable soma by highlighting the effects of specific genetic loci on both reproductive and aging phenotypes. However, future research could indeed explore the intersection of these theories, for example, by examining how resource allocation and genetic predispositions interact to influence longevity in various environmental contexts.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      Emerging evidence suggests that early puberty in males is linked to adverse health outcomes, such as an increased risk of cardiovascular disease, type 2 diabetes, and hypertension in later life (4). A Mendelian randomization study also reported a genetic association between the timing of male puberty and reduced lifespan (5). These findings support the hypothesis that genetic variants associated with delayed reproductive timing in males might similarly confer health benefits or improved longevity, akin to the patterns observed in females. This would suggest that similar mechanisms of antagonistic pleiotropy could operate in males as well.

      In our study, BMI was identified as a mediator between reproductive timing and disease risk. Given that BMI is a common risk factor for age-related diseases in both males and females (6-9), it is plausible that similar mechanisms involving BMI, reproductive timing, and disease risk could exist in males. This shared mediator points to the possibility that, while reproductive timelines may differ, the pathways through which these traits influence aging outcomes may be consistent across genders.

      AP theory could potentially be tested in males, as the principles of the theory may extend to analogous reproductive traits in males, such as age at puberty and testosterone levels, which could similarly influence health outcomes later in life. However, as our current study focuses specifically on female reproductive traits, testing the AP theory in males is outside the scope of this work. We acknowledge the importance of exploring these mechanisms in males, and we hope that future research will address this by investigating male-specific reproductive traits and their relationship to aging and health outcomes.

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      Thank you for your thoughtful comments regarding the statistical analyses and the clarification of methods and variables used in the study.

      Statistical Analyses Section: We have included a detailed explanation of all statistical analyses in the Methods section (lines 291–408), specifying the rationale for the choice of methods, the variables analyzed, and their relationships. Additionally, we have provided the relevant equations or statistical models used where appropriate to ensure transparency.

      Beta Values and Odds Ratios: In the Results section (starting at line 56), both Beta values and Odds Ratios are presented: Beta values were used for analyses of continuous outcomes to quantify the linear relationship between predictors and outcomes. Odds Ratios (ORs) were calculated for binary or categorical disease outcomes to describe the relative odds of an outcome given specific exposures or independent variables.

      Validation and Regression Analyses: For further validation of the MR results, we conducted analyses using the UK Biobank dataset (starting at line 162). Logistic regression analysis was then employed for disease risk assessments involving categorical outcomes (e.g., diseased or not).

      We hope that this clarifies the methods and their applicability to our study, as well as the rationale for the presentation of Beta values and Odds Ratios. If further details or refinements are required, we are happy to incorporate them.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      Thank you for your insightful comments regarding the Mendelian Randomization (MR) analysis and the evaluation of its assumptions. Below, we provide additional clarification on how the MR assumptions were addressed, sensitivity analyses performed, and the representativeness of the genetic instruments (starting at line 314):

      Relevance Assumption (Genetic instruments are associated with the exposure): “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000).” “During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12).”

      Independence Assumption (Genetic instruments are not associated with confounders, Genetic instruments affect the outcome only through the exposure): Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded.

      Sensitivity Analyses Performed: A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). A leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained remaining SNPs.

      Supplemental post-GWAS analysis: Colocalization analysis (starting at line 356), Genetic correlation analysis (starting at line 366).

      Our MR analysis adheres to the guidelines for causal inference in MR studies. By combining multiple sensitivity analyses and ensuring the quality of genetic instruments, we demonstrate that the results are robust and unlikely to be driven by confounding or pleiotropy.

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      Starting in line 314, the steps of SNPs selection were included in the Methods part. “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12). If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.” The SNP numbers of exposures for each outcome and F statistics results were listed in supplemental table S2.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

      We have added it to the manuscript (starting at line 410).

      Reviewer #2 (Recommendations for the authors):

      (1) The outcomes seem to be genotypes (lines 274-288). In MR, genotypes are used as an instrument, representing an exposure, which is then associated with an outcome that is typically observed and measured at a later moment in time than the predictors. If both exposure and outcome are genotypes it is not clear how this works in terms of causality; it would rather reflect a genetic correlation. One would expect the genotypes that function as instruments for the exposure to have a functional cascade of (age-related) effects, leading to an (age-related) outcome. From line 149 the outcomes seem to be phenotypes. Can the authors please clearly explain in each section what is analyzed, how the analyses were done, and why the analyses were done that way?

      Thank you for your insightful comment. We understand the concern regarding the use of genotypes as both exposures and outcomes and the implications this has for interpreting causality versus genetic correlation. To clarify, in our study, the outcomes analyzed in the MR framework are indeed genotypes, starting from line 47. We use genotypes as instrumental variables for exposures, which are then linked to phenotypic outcomes observed at a later stage, in line with standard MR principles.

      To improve the robustness of the MR results, we validated the genetic associations in the population with phenotype data from UK Biobank (lines 162-203), and the detailed methods were listed in lines 385-408.

      (2) Overall, the English writing is good. However, some small errors slipped in. Please check the manuscript for small grammar mistakes like in sentences 10 (punctuation) and 33 (grammar).

      Thank you for your feedback. We appreciate your careful review and attention to detail. We thoroughly rechecked the manuscript for any grammatical errors, including punctuation and sentence structure, especially in sentences 11 and 35 in revised manuscript, as suggested.

      (3) There is currently no results and discussion section.

      The manuscript was submitted as Short Reports article type with a combined Results and Discussion section. We have added the section title of Discussion.

      (4) Why did the authors not include SNPs associated with age at menopausal onset? See for example: https://www.nature.com/articles/s41586-021-03779-7https://urldefense.com/v3/__https://www.nature.com/articles/s41586-021-03779-7__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXWm04XP4$.

      Thank you for your information. Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research.

      (5) Can the authors include genetic correlations between menarche, age at first child, BMI, and preferably menopause?

      Thank you for your suggestion. We acknowledge that including genetic correlations between age at menarche, age at first childbirth, BMI, and menopause can provide valuable context to our analysis. While our current MR study sets age at menarche and age at first childbirth as exposures and menopause as the outcome, and we have already included results that account for BMI-related SNPs before and after correction, we recognize the importance of assessing genetic correlations.

      To address this, we calculated the genetic correlations between these traits to provide insight into their shared genetic architecture. This analysis helps clarify whether there is a significant genetic overlap between the two exposures and between exposure and outcome, which can inform and support the interpretation of our MR results. We appreciate your suggestion and include these calculations to enhance the robustness and comprehensiveness of our study. In the genetic correlations analysis, LDSC software was applied and the genetic correlation values for all pairwise comparisons among age at menarche, age at first birth, BMI, and age at menopause onset were calculated(15,16). The results are listed in Table S6.

      (6) Line 39-40: that is not entirely true. There is also amounting evidence that socioeconomic factors cause earlier onset of menarche through stress-related mechanisms: https://doi.org/10.1016/j.annepidem.2010.08.006https://urldefense.com/v3/__https://doi.org/10.1016/j.annepidem.2010.08.006__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXZ4vbX0y$

      Thank you so much for your information. We changed it to “Considering reproductive events are partly regulated by genetic factors that can manifest the physiological outcome later in life”.

      (7) Why did the authors choose to work with studies derived from IEU Open GWAS? as it is often does not contain the most recent and relevant GWAS for a specific trait.

      We chose to work with studies derived from the IEU Open GWAS database after careful consideration of several sources, including the GWAS Catalog database and recently published GWAS papers. Our selection criteria focused on publicly available GWAS with large sample sizes and a higher number of SNPs to ensure robust analysis. For specific traits such as late-onset Alzheimer's disease and eye aging, we used GWAS data published in scientific articles to ensure that our research reflects the latest findings in the field.

      (1) Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet 48, 1462-1472 (2016). https://doi.org/10.1038/ng.3698

      (2) Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat Hum Behav 1, 757-765 (2017). https://doi.org/10.1038/s41562-017-0195-1

      (3) Stearns, S. C., Byars, S. G., Govindaraju, D. R. & Ewbank, D. Measuring selection in contemporary human populations. Nat Rev Genet 11, 611-622 (2010). https://doi.org/10.1038/nrg2831

      (4) Day, F. R., Elks, C. E., Murray, A., Ong, K. K. & Perry, J. R. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep 5, 11208 (2015). https://doi.org/10.1038/srep11208

      (5) Hollis, B. et al. Genomic analysis of male puberty timing highlights shared genetic basis with hair colour and lifespan. Nat Commun 11, 1536 (2020). https://doi.org/10.1038/s41467-020-14451-5

      (6) Field, A. E. et al. Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161, 1581-1586 (2001). https://doi.org/10.1001/archinte.161.13.1581

      (7) Singh, G. M. et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One 8, e65174 (2013). https://doi.org/10.1371/journal.pone.0065174

      (8) Kivimaki, M. et al. Obesity and risk of diseases associated with hallmarks of cellular ageing: a multicohort study. Lancet Healthy Longev 5, e454-e463 (2024). https://doi.org/10.1016/S2666-7568(24)00087-4

      (9) Kivimaki, M. et al. Body-mass index and risk of obesity-related complex multimorbidity: an observational multicohort study. Lancet Diabetes Endocrinol 10, 253-263 (2022). https://doi.org/10.1016/S2213-8587(22)00033-X

      (10) Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 50, 912-919 (2018). https://doi.org/10.1038/s41588-018-0152-6

      (11) Gao, X. et al. The bidirectional causal relationships of insomnia with five major psychiatric disorders: A Mendelian randomization study. Eur Psychiatry 60, 79-85 (2019). https://doi.org/10.1016/j.eurpsy.2019.05.004

      (12) Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26, 2333-2355 (2017). https://doi.org/10.1177/0962280215597579

      (13) Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209 (2016). https://doi.org/10.1093/bioinformatics/btw373

      (14) Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851-4853 (2019). https://doi.org/10.1093/bioinformatics/btz469

      (15) Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015). https://doi.org/10.1038/ng.3406

      (16) Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

    1. eLife Assessment

      This important study provides a nuanced analysis of the impact of cues on cost/benefit decision-making deficits in male rats that could have translational relevance to many addictive disorders. The main findings are that cues paired with rewarded outcomes increase the proportion of risky outcomes, whereas risky choice is reduced when cues are paired with reward loss. The experimental data is convincing, but the computational analysis based on the optimisation of different Q-learning models is incomplete. The findings will be of interest to behavioural neuroscientists and clinicians with an interest in risk, decision making, and gambling disorders.

    2. Reviewer #1 (Public review):

      Summary:

      Maladaptive decision-making is a trait commonly seen in gambling disorders. Salient cues can impact decision-making and drive gambling, though how cues affect decision-making isn't well understood. This manuscript describes the impact of cueing distinct outcomes of a validated rodent cost/benefit-making task based on the human Iowa Gambling Task. Comparing six task variants, the authors describe the effect of adding salient cues to wins (that scale with the size of win or the inverse), to every outcome regardless of loss or win, randomly to losses or wins, or to losses. Behavioral results reveal that cueing wins increased risky choices. By contrast, presenting the cues randomly or cueing the losses reduced risky choices. Risk-preferring animals of the uncued, randomly cued, and loss-cued tasks showed sensitivity to devaluation, whereas win-paired cued rats did not, suggesting cues blunt behavioral updating. Behavioral analyses were paired with computational modeling of initial acquisition which revealed that risky decision-making was related to reduced punishment learning. These data provide unique insight into how cues may bias behavior and drive gambling-related phenotypes.

      Strengths:

      The detailed analyses provide interesting insight into how cues impact complex decision-making. While there has been a great deal of work into the impact of cues on choice, few studies integrate multiple probabilistic outcomes. Complementing these data with computational parameters helps the reader to understand what may be driving these differences in behavior. The manuscript is well-written, clearly explaining the relevance of the results and potential future directions.

      Weaknesses:

      Two main questions arise from these results. The first - when do behavioral differences emerge between the task variants? Based on the results and discussion, the cues increase the salience of either the wins or the losses, biasing behavior in favor of either risky or optimal choice. If this is the case, one might expect the cues to expedite learning, particularly in the standard and loss condition. Providing an analysis of the acquisition of the tasks may provide insight into how the cues are "teaching" decision-making and might explain how biases are formed and cemented.

      The second question is - does the learning period used for the modeling impact the interpretation of the behavioral results? The authors indicate that computational modeling was done on the first five sessions and used these data to predict preferences at baseline. Based on these results, punishment learning predicts choice preference. However, these animals are not naïve to the contingencies because of the forced choice training prior to the task, which may impact behavior in these early sessions. Though punishment learning may initially predict risk preference, other parameters later in training may also predict behavior at baseline. The authors also present simulated data from the models for sessions 18-20, but according to the statistical analysis section, sessions 35-40 were used for analysis (and presumably presented in Figure 1). If the simulation is carried out in sessions 35-40, do the models fit the data? Finally, though the n's are small, it would be interesting to see how the devaluation impacts computational metrics. These additional analyses may help to explain the nuanced effects of the cues in the task variants.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hathaway et al. describes a set of elegant behavioral experiments designed to understand which aspects of cue-reward contingencies drive risky choice behavior. The authors developed several clever variants of the well-established rodent gambling task (also developed by this group) to understand how audiovisual cues alter learning, choice behavior, and risk. Computational and sophisticated statistical approaches were used to provide evidence that: (1) audiovisual cues drive risky choice if they are paired with rewards and decrease risk if only paired with loss, (2) pairing cues with rewards reduces learning from punishment, and (3) differences in risk-taking seem to be present early on in training.

      Strengths:

      The paper is well-written, the experiments are well-designed, and the results are highly interesting, particularly for understanding how cues can motivate and invigorate normal and abnormal behavior.

      Weaknesses:

      Additional support and evidence are needed for the claims made by the authors. Some of the statements are inconsistent with the data and/or analyses or are only weakly supportive of the claims.

    4. Reviewer #3 (Public review):

      Summary:

      In this work, Hathaway and colleagues aim to understand how audiovisual cues at the time of outcome promote the selection of risky choices. A real-life illustration of this effect is used in electronic gambling machines which signal a win with flashing lights and jingles, encouraging the player to keep betting. More specifically, the authors ask whether the cue has to be paired exclusively to wins, or whether it can be paired to both outcomes, or exclusively loss outcomes, or occur randomly. To tackle this question, they employ a version of the Iowa Gambling Task adapted to rats, and test the effect of different rules of cue-outcome associations on the probability of selecting the riskier options; they then test the effect of prior reward devaluation on the task; finally, the optimised computational models on the early phases of the experiment to investigate potential mechanisms underlying the behavioural differences.

      Strengths:

      The experimental approach is very well thought-out, in particular, the choice of the different task variants covers a wide range of different potential hypotheses. Using this approach, they find that, although rats prefer the optimal choices, there is a shift towards selecting riskier options in the variants of the task where the cue is paired to win outcomes. They analyse this population average shift by showing that there is a concurrent increase in the number of risk-taking individuals in these tasks. They also make the novel discovery that pairing cues with loss outcomes only reduces the tendency for risky decisions.

      The computational strategy is appropriate and in keeping with the accepted state of the art: defining a set of candidate models, optimising them, comparing them, simulating the best ones to ensure they replicate the main experimental results, then analysing parameter estimates in the different tasks to speculate about potential mechanisms.

      Weaknesses:

      There is a very problematic statistical stratagem that involves categorising individuals as either risky or optimal based on their choice probabilities. As a measurement or outcome, this is fine, as previously highlighted in the results, but this label is then used as a factor in different ANOVAs to analyse the very same choice probabilities, which then constitutes a circular argument (individuals categorised as risky because they make more risky choices, make more risky choices...).

      A second experiment was done to study the effect of devaluation on risky choices in the different tasks. The results, which are not very clear to understand from Figure 3, would suggest that reward devaluation affects choices in tasks where the win-cue pairing is not present. The authors interpret this result by saying that pairing wins with cues makes the individuals insensitive to reward devaluation. Counter this, if an individual is prone to making risky choices in a given task, this points to an already distorted sense of value as the most rewarding strategy is to make optimal non-risky choices.

      While the overall computational approach is excellent, I believe that the choice of computational models is poor. Loss trials come at a double cost, something the authors might want to elaborate more upon, firstly the lost opportunity of not having selected a winning option which is reflected in Q-learning by the fact that r=0, and secondly a waiting period which will affect the overall reward rate. The authors choose to combine these costs by attempting to convert the time penalty into "reward currency" using three different functions that make up the three different tested models. This is a bit of a wasted opportunity as the question when comparing models is not something like "are individuals in the paired win-cue tasks more sensitive to risk? or less sensitive to time? etc" but "what is the best way of converting time into Q-value currency to fit the data?" Instead, the authors could have contrasted other models that explicitly track time as a separate variable (see for example "Impulsivity and risk-seeking as Bayesian inference under dopaminergic control" (Mikhael & Gershman 2021)) or give actions an extra risk bonus (as in "Nicotinic receptors in the VTA promote uncertainty seeking" (Naude et al 2016)). Another weakness of the computational section is the fact, that despite simulations having been made, figure 5 only shows the simulated risk scores and not the different choice probabilities which would be a much more interesting metric by which to judge model validity. In the last section, the authors ask whether the parameter estimates (obtained from optimisation on the early sessions) could be used to predict risk preference. While this is an interesting question to address, the authors give very little explanation as to how they establish any predictive relationship. A figure and more detailed explanation would have been warranted to support their claims.

    5. Author response:

      We thank the reviewers for their thoughtful comments and suggestions. We plan to make a number of revisions to the manuscript to address their feedback.

      Firstly, we plan to incorporate feedback related to our modeling approach. We will provide justification for the chosen models and why this dataset is not appropriate for an in-depth exploration of other models. In particular, we will highlight that the models included in this manuscript were taken from Langdon et al. (2019) with a minor extension. Model development and validation in the Langdon et al. (2019) paper required a dataset with >100 rats per task. As the current n per variant is 28-32, and behavioral performance on this task is highly variable, it would be difficult to sufficiently test the validity of models that majorly depart from the previously tested RL models. Nevertheless, we will acknowledge this as a limitation in the discussion section. Additionally, we will test some alternatives suggested by reviewers that fall within the scope of the current RL modeling framework (e.g., comparison to a standard delta-rule update for unrewarded choices). We will address other concerns brought up by reviewers by a.) providing a rationale for why we constrained our analyses to the first five sessions, b.) simulating data for sessions that match those that were analyzed in the real data (i.e., sessions 35-40 instead of 18-20), and c.) including a figure of the simulated choice probabilities rather than just risk score.

      Secondly, we will include additional analyses and clarify the current statistical approach to address comments on how the data were analyzed. We will include an analysis of task acquisition to investigate when choice preferences emerge across the different variants. We will justify the statistical approach used for detecting behavioral differences between task variants, including a better explanation of the inclusion of the risky/optimal label as a between-subjects factor in the ANOVAs. We will also expand the section on parameters predicting risk preference on the rGT to fully explain the statistical method used and provide a figure of the results.

      Lastly, we will provide a more detailed rationale for the reinforcer devaluation test, and describe the hypothesis it tests. We will also expand on how the results from the devaluation test support our conclusions, and address alternative explanations suggested by the reviewers.

    1. eLife Assessment

      This work presents an atlas of vasopressin (AVP) and its receptor AVPR1a in mouse brains using RNAscope to map single transcript expressions of Avp and Avpr1a across various brain regions in males and females. The findings are valuable in that they identify brain regions expressing Avpr1a mRNA transcript. The impact of findings is decreased by incomplete analysis of the data due to limited description of Avpr1a mRNA distribution within brain regions and limited statistical inference.

    2. Reviewer #1 (Public review):

      Summary:

      Despite accumulating prior studies on the expressions of AVP and AVPR1a in the brain, a detailed, gender-specific mapping of AVP/AVPR1a neuronal nodes has been lacking. Using RNAscope, a cutting-edge technology that detects single RNA transcripts, the authors created a comprehensive neuroanatomical atlas of Avp and Avpr1a in male and female brains. The findings are important, given that: (1) a detailed, gender-specific mapping of AVP/AVPR1a neuronal nodes has been lacking, and (2) the study offers valuable new insights into Avpr1a expression across the mouse brain. The findings are solid, and with improved data presentation and analysis, this work could serve as an important resource for the neuroscience community.

      Strengths:

      This well-executed study provides valuable new insights into gender differences in the distribution of Avp and Avpr1a. The atlas is an important resource for the neuroscience community.

      Weaknesses:

      A few concerns remain to be addressed. The primary weakness of this manuscript lies in the robustness of its data presentation and analysis.

    3. Reviewer #2 (Public review):

      Summary:

      The authors conducted a brain-wide survey of vasopressin and vasopressin receptor 1A gene expression in the mouse brain using a high-resolution in situ hybridization method called RNAscope. Overall, the findings are useful in identifying brain regions expressing Avpr1a transcript. The impact of findings is decreased by incomplete or inadequate data analysis due to limited description of Avpr1a mRNA distribution within brain regions and limited statistical inference. A comprehensive overview of Avpr1a expression in the mouse brain has the potential to be highly informative and impactful. The current manuscript used RNAscope (a proprietary method of in situ hybridization) to assess the transcript abundance of Avp (arginine vasopressin, a neuropeptide) and its receptor (Avpr1a). The style of graphs, limited use of photomicrographs, and low number of subjects all combine to limit the impact of the dataset. The finding of Avp-expressing cells outside of the hypothalamus and extended amygdala is poorly documented but would be novel. The Avpr1a data suggest expression in numerous brain regions. However, the data presented are difficult to interpret, with every value being an extremely small density value for a large swath of the brain. How many cells are impacted? Are puncta spread across many cells or only present in a few cells? Is density evenly distributed through a brain region or compacted into a subfield? For a descriptive study, there is minimal statistical inference and relatively little description. The authors make a case for the novel nature of the work but do not seem, at times, to recognize a robust literature developed over the last 50 years. In conclusion, the experimental data are important and informative; however, the low number of subjects, lack of statistical power, limited description of individual brain regions, and poor quality and design of data figures reduce the overall impact.

      Strengths:

      A survey of Avpr1a expression in the mouse brain is an important tool for exploring the function of vasopressin in the mammalian brain and developing hypotheses about cell - and circuit-level function.

      Weaknesses:

      (1) The style and type of data presentation, focusing on the density of individual mRNA transcript across a whole brain region, seemed incomplete in so far as the data presentation did not provide a clear visualization of the distribution of Avpr1a-expressing cells or transcript itself. However, knowing which brain regions do express transcript is itself informative.

      (2) The manuscript strongly emphases on the possibility of sex differences in Avp and Avpr1a expression. However, the low number of animals used does not provide adequate statistical power to make strong inferences regarding sex differences in the data.

      (3) The manuscript's methods are minimal but adequate to understand data acquisition. The description of how quantitative analyses were conducted is inadequate and would be impossible to replicate beyond identifying the program used.

    1. eLife Assessment

      This valuable study investigates the computational role of top-down feedback -- a property that is found in biological circuits -- in Artificial Neural Network (ANN) models of the neocortex. Using hierarchical recurrent ANNs in an audiovisual integration task, the authors show a visual bias consistent with that observed in human perception, which mildly improves learning speed. While the study offers a tool that is of value for studying top-down feedback in cortical models, with the potential to inspire other fields (e.g. machine learning), the presented evidence for a general framework of deep learning architectures that predict behavior is incomplete, and the methods section lacks sufficient detail in terms of hyperparameter choice and network structures.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors aim to investigate the potential improvements of ANNs when used to explain brain data using top-down feedback connections found in the neocortex. To do so, they use a retinotopic and tonotopic organization to model each subregion of the ventral visual (V1, V2, V4, and IT) and ventral auditory (A1, Belt, A4) regions using Convolutional Gated Recurrent Units. The top-down feedback connections are inspired by the apical tree of pyramidal neurons, modeled either with a multiplicative effect (change of gain of the activation function) or a composite effect (change of gain and threshold of the activation function).

      To assess the functional impact of the top-down connections, the authors compare three architectures: a brain-like architecture derived directly from brain data analysis, a reversed architecture where all feedforward connections become feedback connections and vice versa, and a random connectivity architecture. More specifically, in the brain-like model the visual regions provide feedforward input to all auditory areas, whereas auditory areas provide feedback to visual regions.

      First, the authors found that top-down feedback influences audiovisual processing and that the brain-like model exhibits a visual bias in multimodal visual and auditory tasks. Second, they discovered that in the brain-like model, the composite integration of top-down feedback, similar to that found in the neocortex, leads to an inductive bias toward visual stimuli, which is not observed in the feedforward-only model. Furthermore, the authors found that the brain-like model learns to utilize relevant stimuli more quickly while ignoring distractors. Finally, by analyzing the activations of all hidden layers (brain regions), they found that the feedforward and feedback connectivity of a region could determine its functional specializations during the given tasks.

      Strengths:

      The study introduces a novel methodology for designing connectivity between regions in deep learning models. The authors also employ several tasks based on audiovisual stimuli to support their conclusions. Additionally, the model utilizes backpropagation of error as a learning algorithm, making it applicable across a range of tasks, from various supervised learning scenarios to reinforcement learning agents. Conversely, the presented framework offers a valuable tool for studying top-down feedback connections in cortical models. Thus, it is a very nice study that also can give inspiration to other fields (machine learning) to start exploring new architectures.

      Weaknesses:

      Although the study explores some novel ideas on how to study the feedback connections of the neocortex, the data presented here are not complete in order to propose a concrete theory of the role of top-down feedback inputs in such models of the brain.

      (1) The gap in the literature that the paper tries to fill in the ability of DL algorithms to predict behavior: "However, there are still significant gaps in most deep neural networks' ability to predict behavior, particularly when presented with ambiguous, challenging stimuli." and "[...] to accurately model the brain."

      It is unclear to me how the presented work addresses this gap, as the only facts provided are derived from a simple categorization task that could also be solved by the feedforward-only model (see Figures 4 and 5). In my opinion, this statement is somewhat far-fetched, and there is insufficient data throughout the manuscript to support this claim.

      (2) It is not clear what the advantages are between the brain-like model and a feedforward-only model in terms of performance in solving the task. Given Figures 4 and 5, it is evident that the feedforward-only model reaches almost the same performance as the brain-like model (when the latter uses the modulatory feedback with the composite function) on almost all tasks tested. The speed of learning is nearly the same: for some tested tasks the brain-like model learns faster, while for others it learns slower. Thus, it is hard to attribute a functional implication to the feedback connections given the presented figures and therefore the strong claims in the Discussion should be rephrased or toned down.

      (3) The Methods section lacks sufficient detail. There is no explanation provided for the choice of hyperparameters nor for the structure of the networks (number of trainable parameters, number of nodes per layer, etc). Clarifying the rationale behind these decisions would enhance understanding. Moreover, since the authors draw conclusions based on the performance of the networks on specific tasks, it is unclear whether the comparisons are fair, particularly concerning the number of trainable parameters. Furthermore, it is not clear if the visual bias observed in the brain-like model is an emerging property of the network or has been created because of the asymmetries in the visual vs. auditory pathway (size of the layer, number of layers, etc).

    3. Reviewer #2 (Public review):

      Summary:

      This work addresses the question of whether artificial deep neural network models of the brain could be improved by incorporating top-down feedback, inspired by the architecture of the neocortex.

      In line with known biological features of cortical top-down feedback, the authors model such feedback connections with both, a typical driving effect and a purely modulatory effect on the activation of units in the network.

      To assess the functional impact of these top-down connections, they compare different architectures of feedforward and feedback connections in a model that mimics the ventral visual and auditory pathways in the cortex on an audiovisual integration task.

      Notably, one architecture is inspired by human anatomical data, where higher visual and auditory layers possess modulatory top-down connections to all lower-level layers of the same modality, and visual areas provide feedforward input to auditory layers, whereas auditory areas provide modulatory feedback to visual areas.

      First, the authors find that this brain-like architecture imparts the models with a light visual bias similar to what is seen in human data, which is the opposite in a reversed architecture, where auditory areas provide a feedforward drive to the visual areas.

      Second, they find that, in their model, modulatory feedback should be complemented by a driving component to enable effective audiovisual integration, similar to what is observed in neural data.

      Last, they find that the brain-like architecture with modulatory feedback learns a bit faster in some audiovisual switching tasks compared to a feedforward-only model.

      Overall, the study shows some possible functional implications when adding feedback connections in a deep artificial neural network that mimics some functional aspects of visual perception in humans.

      Strengths:

      The study contains innovative ideas, such as incorporating an anatomically inspired architecture into a deep ANN, and comparing its impact on a relevant task to alternative architectures.

      Moreover, the simplicity of the model allows it to draw conclusions on how features of the architecture and functional aspects of the top-down feedback affect the performance of the network.

      This could be a helpful resource for future studies of the impact of top-down connections in deep artificial neural network models of the neocortex.

      Weaknesses:

      Overall, the study appears to be a bit premature, as several parts need to be worked out more to support the claims of the paper and to increase its impact.

      First, the functional implication of modulatory feedback is not really clear. The "only feedforward" model (is a drive-only model meant?) attains the same performance as the composite model (with modulatory feedback) on virtually all tasks tested, it just takes a bit longer to learn for some tasks, but then is also faster at others. It even reproduces the visual bias on the audiovisual switching task. Therefore, the claims "Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature." and "More broadly, our work supports the conclusion that both the cellular neurophysiology and structure of feed-back inputs have critical functional implications that need to be considered by computational models of brain function" are not sufficiently supported by the results of the study. Moreover, the latter points would require showing that this model describes neural data better, e.g., by comparing representations in the model with and without top-down feedback to recorded neural activity.

      Second, the analyses are not supported by supplementary material, hence it is difficult to evaluate parts of the claims. For example, it would be helpful to investigate the impact of the process time after which the output is taken for evaluation of the model. This is especially important because in recurrent and feedback models the convergence should be checked, and if the network does not converge, then it should be discussed why at which point in time the network is evaluated.

      Third, the descriptions of the models in the methods are hard to understand, i.e., parameters are not described and equations are explained by referring to multiple other studies. Since the implications of the results heavily rely on the model, a more detailed description of the model seems necessary.

      Lastly, the discussion and testable predictions are not very well worked out and need more details. For example, the point "This represents another testable prediction flowing from our study, which could be studied in humans by examining the optical flow (Pines et al., 2023) between auditory and visual regions during an audiovisual task" needs to be made more precise to be useful as a prediction. What did the model predict in terms of "optic flow", how can modulatory from simple driving effect be distinguished, etc.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the computational role of top-down feedback in artificial neural networks (ANNs), a feature that is prevalent in the brain but largely absent in standard ANN architectures. The authors construct hierarchical recurrent ANN models that incorporate key properties of top-down feedback in the neocortex. Using these models in an audiovisual integration task, they find that hierarchical structures introduce a mild visual bias, akin to that observed in human perception, not always compromising task performance.

      Strengths:

      The study investigates a relevant and current topic of considering top-down feedback in deep neural networks. In designing their brain-like model, they use neurophysiological data, such as externopyramidisation and hierarchical connectivity. Their brain-like model exhibits a visual bias that qualitatively matches human perception.

      Weaknesses:

      While the model is brain-inspired, it has limited bioplausibility. The model assumes a simplified and fixed hierarchy. In the brain with additional neuromodulation, the hierarchy could be more flexible and more task-dependent.

      While the brain-like model showed an advantage in ignoring distracting auditory inputs, it struggled when visual information had to be ignored. This suggests that its rigid bias toward visual processing could make it less adaptive in tasks requiring flexible multimodal integration. It hence does not necessarily constitute an improvement over existing ANNs. It is unclear, whether this aspect of the model also matches human data. In general, there is no direct comparison to human data. The study does not evaluate whether the top-down feedback architecture scales well to more complex problems or larger datasets. The model is not well enough specified in the methods and some definitions are missing.

    1. eLife Assessment

      This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based microendoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors provide convincing evidence showing that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (esp. Antonini et al, 2020; eLife), filling out the quiver of available extended-field-of-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.

      Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in piriform cortex, which is difficult to access, especially in chronic preparations.

      The design, characterization, and simulations are clear and thorough, but they do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes to their own research.

      Strengths:

      The text is clearly written, the ex vivo analysis is thorough and well supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.

      The authors did a good job of addressing issues I raised in initial review, including analyses of chromaticity and the axial field of view, descriptions of manufacturing and assembly yield, explanations in the text of differences between ex vivo and in vivo imaging conditions, and basic analysis of the in vivo recordings relative to odor presentations. They have also shortened the text, reduced repetition, and better motivated their approach in the introduction.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.

      This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral region of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.

      Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.

      The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.

      Comments on revisions:

      The authors have clearly addressed all my comments.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents the development, characterization and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two microendoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.

      Strengths:

      (1) The paper is generally clear and well written. The scientific approach is well structured, and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected microendoscopes:

      - PSFs measured with corrected microendoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected microendoscopes.

      - Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected microendoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.

      - Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.

      (2) There is a strong need for high quality microendoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient and potentially easy to disseminate within the neuroscience community.

      Weaknesses:

      Weaknesses that were present in the first version of the paper were carefully addressed by the authors.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      (1) As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate.

      I see no reason to include these inaccurate measurements.  

      We agree with the Referee and removed the indicated figure (old Supplementary Fig. 4) and data.

      Reviewer #3:

      (1) It would be interesting to comment on how the addition of a coverslip changes the performance of the uncorrected microendoscope compared to the use of bare grin lenses. 

      We modified the discussion section (page 18) and added a new reference (#36) to include the request of the Referee.

      (2) In Figure 6C-H, the authors can indeed show data corresponding to all detected cells, but I still think that the statistics should be calculated using the same effective FOV. 

      We modified Figure 6 legend to include the request of the Referee.

      (3) Authors could present the images in Figures 4-6 as in the original version, with a scale bar in the centre of the FOV that is different for the two types of objectives (corrected vs uncorrected). They could add a short justification for this choice, and perhaps present the other version for Figure 4 in a supplementary information sheet (with similar scale bars at the centre of the FOV for both types of objectives). It would allow readers to appreciate that the FOV still appears significantly enlarged with this other presentation.

      As requested by the Referee, we modified the text in the Result section (page 11) and added the additional version of Figure 4 as Figure 4-figure supplement 1.

    1. eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including the robustness of the CF marking and manipulation approach and the unclear efficacy of longer-duration climbing fiber activity suppression.

    2. Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminished by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      [Editors' note: we have included the original concerns, which the Reviewing Editor agrees with. Methodological concerns remain after revisions.]

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding the penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their longterm activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning cannot be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the reviewer’s thorough evaluation, which thoughtfully highlights the strengths and areas for improvement in our study.

      We agree with the reviewer’s recognition of the novelty of our approach, particularly in specifically perturbing climbing fiber (CF) activity in the flocculus and examining its effects across distinct phases of learning. Additionally, our use of the well-established OKR behavior paradigm provides a robust framework for investigating cerebellar learning processes, further strengthening our study.

      To address concerns regarding the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we conducted additional experiments. These include in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase. To ensure precise targeting and mitigate potential side effects, such as unintended modification of Purkinje cell (PC) simple spike activity, we demonstrated that optogenetic suppression of CF transmission did not affect simple spike firing. Furthermore, we made additional characterizations to confirm the specificity of viral targeting.

      Lastly, we recognize the importance of exploring alternative mechanisms underlying CF involvement in cerebellar learning. Accordingly, we expanded the manuscript to provide a more comprehensive discussion of these mechanisms, offering a clearer perspective on the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Inhibitory optogenetic actuators are generally problematic, especially in time frames longer than seconds. If the authors wish to be able to inhibit activity in the flocculus-targeting CFs for a long time, maybe it would make sense to try to retrogradely transfect the IO neurons from the flocculus (using a cre-lox approach) with inhibitory DREADDs. This approach is also full of problems, so the absence or significant decrease in CS activity throughout the period of manipulation must be demonstrated.

      In addition to re-examining the strength of the evidence regarding the role of CFs in the consolidation and retrival phases, the manuscript would benefit from significant reworking of the details in the manuscript and figures. Below is a possibly incomplete list of things we would want to highlight:

      (1) While the text states the authors "... verified the potential reduction of Cs firing rate in PCs of awake mice in vivo by inhibiting CF signals", the data nor a figure are shown. This is of critical importance when judging the reliability of the following results. The data presented in panels Figure 1D-E should also be improved to be more informative, specifically, the waveforms of EPSCs should be shown in higher resolution. We are not informed about how many cells/slices/animals the results are obtained from, nor how many trials were done per condition. Finally, the in vitro data is from vermal Purkinje neurons, while the focus of the work is in the flocculus. Please provide these verifications for the flocculus.

      To verify the suppression of complex spike (Cs) activity, we conducted additional in-vivo experiments and added Figure 2, which presents recordings of Cs firing rates from Purkinje cells (PCs) during optogenetic suppression of climbing fiber (CF) activity. These data demonstrate that the suppression specifically and robustly targets Cs activity without affecting simple spike firing, as shown in Figure 2C. The results presented in Figure 2 were acquired at 40 minutes of optostimulation, consistently showing effective suppression of Cs activity throughout this period. While continuous recordings over several hours were not performed, the stability and sustained suppression observed at the 40-minute mark strongly suggest that the manipulation remains effective during the extended durations required for the behavioral tests.

      Additionally, we have improved Figure 1D by enhancing the resolution of EPSC waveforms and including more detailed information in the figure legend regarding the number of cells and animals analyzed. For the current-clamp mode data (Figures 1E and F), we clarified the experimental conditions to provide additional context. While the in vitro data were collected from vermal PCs, these experiments were intended to illustrate the fundamental properties of CF-PC transmission.

      (2) It is challenging to get a homogenous transfection of all CFs in a given region. To be able to judge the significance of the results, the readers should be provided with material allowing assessing the transfection quality. The images shown in panels Bi-ii are spatially restricted and of too low quality to make judgements. Also, it is not stated whether the images shown are from GFP or NpHR-transfected animals. These different payloads are delivered using different viral capsids (AAV1 vs. AAV9) that have significantly different transfection capacities and results from AAV9-CamKIIGFP cannot be generalized to AAV1-CamKII-NpHR. Please show the expression for the capsid used with NpHR.

      To clarify, the images in Figure Bi-ii are representative of GFP expression in animals transfected using AAV1-CamKII-EGFP. The purpose of these panels is to confirm the successful targeting of the region of interest rather than to evaluate viral tropism or capsid-specific transfection efficiency. Moreover, while the transfection characteristics of AAV1 and AAV9 may differ, the key experimental parameter of effective CF suppression was validated through in-vivo electrophysiological recordings, which robustly confirm the efficacy of NpHR expression.

      (3) Finally, please show the location of the optic fiber implant in the flocculus from post-mortem images.

      In Figure 3a of our revised manuscript, we added post-mortem histological images showing the exact location of the optic fiber implants in the flocculus. These images provided clear confirmation that the optogenetic stimulation was targeted to the correct anatomical region, ensuring that the observed effects are attributable to CF manipulation in the flocculus.

      Reviewer #2 (Recommendations For The Authors):

      (1) The efficacy of CF suppression is questionable. The histology in Figure 1 shows that only a handful of CFs are transduced in their approach. This observation casts doubt on the claimed complete suppression of CF-evoked EPSCs in every recorded PC in the same figure. This necessitates a more detailed explanation for this apparent discrepancy. Also, the absence of current-clamp recordings to measure the effect on CF-evoked complex spiking in PCs and the lack of detail regarding the timing of optogenetic actuation (continuous or pulsed) during these slice experiments are also significant omissions.

      We are providing additional in vivo electrophysiological recordings showing sustained CF suppression in awake animals (Figure 2). These recordings will directly demonstrate the extent of CFevoked complex spike (Cs) suppression.

      Moreover, we have included additional data of current-clamp recordings to measure the impact of CF suppression on Cs activity (Figures 1E and 1F). Regarding the timing of the optogenetic actuation, the stimulation was applied continuously in the slice experiments.

      (2) The authors claim that their method effectively suppresses CF activity in vivo, yet they do not present any supporting data. Given the histological evidence provided, it's questionable whether their approach truly impacts the CF population broadly, casting doubts on the efficacy of their suppression approach to identify the role of CFs during behavior. To address these concerns, further experiments and detailed quantification are essential to validate the extent and uniformity of CF suppression achieved.

      As we responded earlier, we conducted additional in-vivo experiments with continuous recordings of CF-evoked complex spike (Cs) activity during optogenetic suppression (Figure 2). These data directly demonstrate effective and sustained inhibition of CF transmission throughout the behavioral experiments. Quantification of CF suppression revealed consistent inhibition across the manipulation period, with no observable alterations in Purkinje cell simple spike firing rates, confirming that our intervention specifically targeted CF activity without off-target effects. In addition to the in-vivo data, the in-vitro data presented in Figure 1 (lines 107~116) further validate the efficacy of our optogenetic manipulation, showing consistent suppression of CF transmission without any failures. These findings collectively confirm the reliability and specificity of our suppression approach for studying CF contributions to behavior.

      (3) To optogenetically test the role of CFs in memory consolidation, the authors deliver continuous, high-power light to the flocculus (13 mW for 6 hrs). This extends well beyond typical experimental conditions. The sustained nature of the light exposure thus brings into question the consistency and reliability of CF suppression over time. Firstly, it is imperative to determine whether CF activity is suppressed throughout this extended period. Secondly, the intensity and duration of light exposure carry a significant risk of causing extensive damage to the surrounding tissue. Given these concerns, a thorough histological examination is warranted to assess the potential adverse effects on tissue integrity. Such an analysis is crucial not only for validating the experimental outcomes but also for ensuring that the observed effects are not confounded by light-induced tissue damage.

      To address whether CF activity is suppressed throughout the extended period, we included new in-vivo recordings demonstrating robust suppression of CF transmission, as evidenced by inhibited complex spikes sustained at 40 minutes of optostimulation. Regarding potential tissue damage, our optogenetic protocol used a light intensity (13 mW), which is much lower than the 75 mW threshold reported by Cardin et al. (2010) as sufficient to maintain normal neuronal activity. Moreover, critical damage typically requires intensities exceeding 100 mW for several hours (Cardin, Jessica A., et al. "Targeted optogenetic stimulation and recording of neurons in vivo using cell-type-specific expression of Channelrhodopsin-2." Nature protocols 5.2 (2010): 247-254.). Finally, we observed no abnormalities in long-term memory consolidation or gain retrieval (Figures 3C, 4C, 4F), further supporting that our light stimulation did not induce tissue damage.

      (4) The generalizability of their findings to various learning behaviors remains uncertain. Given that the flocculus plays a role in vestibulo-ocular reflex (VOR) adaptation, which encompasses both CFdependent and CF-independent learning types (gain increase and gain decrease, respectively), this system could offer a more feasible approach for investigating hypotheses about the role of CFs in guiding distinct learning processes.

      In response to the reviewer’s comment on the generalizability of our findings to learning behaviors involving both CF-dependent and CF-independent mechanisms, we acknowledge the importance of examining these dynamics in cerebellar motor adaptation systems, such as the OKR. Although our study used an OKR task, findings from VOR studies apply here. Ke et al. (2009) demonstrated that VOR gain increases (CF-dependent) and gain decreases (CF-independent) involve distinct plasticity processes (Ke, Michael C., Cong C. Guo, and Jennifer L. Raymond. "Elimination of climbing fiber instructive signals during motor learning." Nature neuroscience 12.9 (2009): 1171-1179), suggesting that CF engagement is task-dependent, particularly for larger error signals that require CF-guided adaptation.

      Similarly, our OKR findings suggest that CF-dependent pathways are likely used for large, persistent errors, whereas CF-independent mechanisms may drive more gradual adjustments. This alignment between OKR and VOR systems supports the generalizability of CF-selective adaptation across cerebellar learning tasks. We have elaborated on this point in our revised manuscript (lines 219~237), clarifying how CF-dependent and CF-independent mechanisms can generalize across motor learning contexts in the cerebellum.

      (5) The acute effect of CF suppression on OKR eye movements warrants investigation. If OKR eye movements are altered by their method, this could complicate the interpretation of their results.

      During our experiments, we monitored ocular movements during CF optogenetic manipulation and found no aberrant effects, such as nystagmus. As shown in Figures 4G and 4H, disrupting CF signaling during gain retrieval did not alter the gain, confirming that our manipulation neither acutely affects ocular reflexes nor induces abnormal eye movement. Therefore, it leads to the conclusion that the observed effects are specific to learning and memory processes.

      (6) The authors raise the potential issue of inducing presynaptic LTD in CFs. Can they be sure that their manipulation doesn't generate a similar effect? Additional controls or techniques to accurately interpret the results are needed considering this concern.

      However, our discussion does not claim that optogenetic suppression directly induces CF-LTD. Instead, we posit that CF suppression may have mimicked the functional consequences of CFLTD, such as reduced complex spike (Cs) activity and associated calcium signaling. This, in turn, may have indirectly interfered with the induction of parallel fiber-Purkinje cell (PF-PC) LTD, thereby preventing gain enhancement during learning.

      This hypothesis is consistent with previous studies highlighting the interplay between CF and PF synaptic plasticity in cerebellar motor learning. For example, Hansel and Linden (2000) and Weber et al. (2003) discuss how changes at CF synapses can modulate Cs waveforms and calcium dynamics, which are critical for PF-PC LTD. Coesmans et al. (2004) and Han et al. (2007) further elaborate on the necessity of CF input for effective PF-PC LTD induction during learning tasks such as retinal slip correction.

      While our experiments were not designed to directly measure CF-LTD, the observed prevention of gain enhancement aligns with the hypothesis that CF suppression functionally disrupted downstream PF-PC LTD. We have clarified these points in our revised manuscript (lines 250~258) to avoid misunderstanding.

      (7) The specific timeframe for OKR consolidation remains uncertain, with evidence from numerous studies indicating that cerebellar memory consolidation unfolds over several days. Therefore, a more thorough investigation into these extended durations, supported by control experiments to validate the outcomes, would significantly strengthen the study's conclusions, and provide clearer insights into the consolidation process of OKR learning.

      Our current study specifically focused on the early phase of the post-learning period, as supported by findings from several studies: Cooke et al., (2004); Titley et al., (2007); Steinmetz et al., (2016); Seo et al., (2024)

      These studies collectively indicate that cerebellar-dependent memory consolidation—including OKR—can occur rapidly during the early consolidation phase. While the specific mechanisms examined in these studies vary (e.g., synaptic plasticity, intrinsic plasticity, or circuit-level changes), they consistently demonstrate that modifications in the cerebellum after the early consolidation period no longer influence memory storage or performance. This evidence strongly supports the relevance of our experimental focus and the timing of our interventions.

      We acknowledge the importance of investigating extended consolidation periods, which could indeed provide additional insights. However, given our current aims, the rapid consolidation dynamics observed in the early phase are most relevant to the questions addressed in this study. We have elaborated on these matter in our revised manuscript (lines 273~283).

      (8) Issues around whether the authors have control over CF activity with their optogenetic intervention raise questions of whether learning can be recovered during the training procedure if the optogenetic stimuli are halted. Specifically, if suppression is applied for three blocks (what the authors refer to as "sessions") during the training procedure and then ceases, does learning rapidly recover in the immediately following blocks?

      While we did not directly examine the restoration of learning capability within the same training session following the cessation of optogenetic inhibition, we believe several aspects of our experimental design and insights from prior studies support our interpretation.

      Our optogenetic intervention specifically targeted Purkinje cells (PCs) in the flocculus and was applied continuously during designated training sessions to modulate cerebellar activity. Notably, Medina et al. (2001) demonstrated that transient inactivation of the cerebellar cortex impairs the expression of learned responses but does not disrupt the underlying plasticity mechanisms (Medina, Javier F., Keith S. Garcia, and Michael D. Mauk. "A mechanism for savings in the cerebellum." Journal of Neuroscience 21.11 (2001): 4081-4089.). This finding suggests that cerebellar plasticity remains intact and functional even after transient perturbations.

      Therefore, it is plausible that once optogenetic inhibition is lifted, the cerebellar network regains its capacity for learning and adaptation, as the intrinsic plasticity and memory encoding processes remain preserved. While we acknowledge that direct experimental confirmation of rapid recovery in our setup was not performed, this interpretation is consistent with our experimental framework and the broader literature.

      (9) The study does not fully explore the instructive signals/mechanisms underlying the memory consolidation process. A detailed investigation into potential instructive signals for consolidation beyond CF-induced signaling, like the simple spiking of PCs, could significantly enhance the study's conclusions. Indeed, there is currently no evidence to suggest that CFs play a role in the consolidation phase anyway so testing their role seems a bit of a strawman argument.

      While our study primarily focused on characterizing CF-dependent pathways, we acknowledge that memory consolidation is likely driven by a multifaceted interplay of instructive signals beyond CF-induced mechanisms. In particular, Purkinje cell (PC) simple spiking may act as a critical signal during the consolidation phase, either complementing or functioning independently of CF input. Emerging evidence suggests that simple spiking can modulate downstream circuitry in ways that stabilize and strengthen memory traces.

      To address this, we have expanded the discussion in the revised manuscript to explore potential instructive signals for consolidation, including PC simple spiking, local circuit plasticity within the cerebellar cortex, and its interaction with the cerebellar nuclei. We propose that these mechanisms collectively contribute to the transfer and stabilization of motor memory, offering a more comprehensive framework for understanding consolidation. We have elaborated on these matter in our revised manuscript (lines 238~250).

      (10) Previous reports have highlighted the necessity of CF activity for extinction/memory maintenance (Medina et al. 2002; Kim et al. 2020). That is, the absence of CF activity is consequential for cerebellar function. These results present a potential contrast to the findings reported in this current study. This discrepancy raises important questions about the experimental conditions, methodologies, and interpretations of CF function across different studies. A thorough discussion comparing these divergent outcomes is essential, as it could elucidate the specific contexts or conditions under which CF activity influences memory processes.

      We acknowledge that previous studies (Medina et al., 2002; Kim et al., 2020) have suggested a role for climbing fiber (CF) activity in extinction. However, our study specifically focuses on the acquisition phase of motor learning and does not extend to extinction or maintenance. As such, we have revised our discussion to limit interpretations strictly to the scope of our findings and removed references to extinction.

      The discrepancies between our results and prior work may arise from differences in methodologies and behavioral paradigms. For instance, we utilized optogenetic inhibition to achieve precise temporal and spatial control of CF activity, whereas previous studies employed pharmacological or lesion methods that may have broader effects on the cerebellar circuitry. Additionally, differences in behavioral paradigms, such as the optokinetic reflex (OKR) task used in our study compared to the eye-blink conditioning tasks in prior studies, may demand distinct roles for CF signaling depending on the specific requirements for error correction and adaptation.

      This clarification is now incorporated into our revised manuscript, and the discussion has been streamlined to focus on the phase-specific role of CF activity during acquisition without extending to extinction or maintenance (lines 259~270).

    1. eLife Assessment

      This important study investigates the influence of the cingulate cortex on the development of the social vocalizations of marmoset monkeys by making bilateral lesions of this brain area in neonatal animals. The evidence supporting the authors' claims is convincing. The work will be of broad interest to cognitive neuroscientists, speech and language researchers, and primate neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      This study seeks to quantify changes in vocal behavior during development in marmosets with bilateral anterior cingulate cortex (ACC) lesions. The ACC and its role in social vocal behaviors is of great interest given previous literature on its involvement in initiation of vocalizations, processing emotional content, and its connectivity to two other critical nodes in the vocal network, the amygdala and the PAG. The authors seek to test the hypothesis that the ACC contributes to the development of mature vocal behaviors during the first few weeks of life by disrupting this process with neonatal ACC lesions. Imaging and histological analyses confirm the extent of the lesion and suggest downstream effects in connected regions. Analysis of call rates and call type proportions show no or slight differences between lesioned and controlled animals. Additional analyses on the proportion of grouped 'social' calls and certain acoustic features of a particular call, the phee, reveal more distinct differences between the groups.

      Strengths:

      The authors have identified that ACC lesions in early life have no or little influence on certain aspects of vocal behavior (e.g. call rate, call intervals) but larger impacts on other aspects (e.g. acoustic features of phee calls). This is difficult data to collect, especially in the difficulties of that particular time period. This data is a valuable addition to the literature on the effects of the ACC on vocal production and sparks intriguing follow-up questions on the role of different acoustic features (as related to emotional content) on vocal interactions with conspecifics over the lifespan.

      The histological methods and resulting quantification of neural changes in the lesioned area and in downstream areas of interest are intriguing given the large time gap between the lesion and these analyses.

      The changes to the text, figures, and additional supplemental figures to my previous review requests have made it easier to determine if conclusions are supported by the data in the manuscript. Examples include the quantification of the loss of neurons and increase in glial cells, the inclusion of changes in body weight and grip strength that could also be a result from the lesions affecting vocal behavior, and additional details on analysis methods.

      Weaknesses:

      The article emphasizes vocal social behavior. However, marmoset infants are recorded in isolation which allows for examining the development of vocal behavior in that particular context - reaching out to conspecifics. The text now covers the relationship between 'social' information in calls and development in this particular context. However, early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross-context, where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. Together, the results are interesting and important but may not fully capture the changes resulting from direct social interactions.

      Additionally, it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular'. Though the text about how the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is now better explained with additional text from human studies, it is still an assumption that this would directly translate to marmoset communication. Thus, experiments directed at the responses of other marmosets to these calls would still be important.

    3. Reviewer #2 (Public review):

      Summary:

      Nagarajan et al. investigate the role of the anterior cingulate cortex (ACC) in vocal development of infant marmoset monkeys using lesions in this brain area. Many previous studies show that ACC plays an important role in volitional and emotion-driven vocal behavior in mammals. The experiments Nagarajan et al. performed strengthen the long-standing hypothesis that ACC influences the development of social-vocal behavior in non-human primates. Furthermore, their anatomical studies support the idea of cortical structures exerting cognitive control over subcortical networks for innate vocalization, and thus, enabling mammals to perform flexible social-vocal communication.

      Strengths:

      Many invasive behavioral studies in monkeys often use 2-3 animals. The authors used a sufficiently high number of animals for their experiments. This increases the power of their conclusions.

      The study also investigates the impact of ACC lesions on downstream areas important for innate vocal production. This adds further evidence to the role of ACC on influencing these subcortical regions during vocal development and vocal behavior in general.

      Weaknesses:

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals. The authors give a reasonable explanation for why they did not provide this data.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypothesis about the exact role of ACC in social-vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development? Unfortunately, the authors are unable to provide that data. Hopefully, this will be the goal of future studies.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Nagarajan et al. study the impact of early damage to the anterior cingulate cortex (ACC) on the vocal development of marmoset monkeys. AAC lesions were performed on neonatal marmosets and their vocal patterns and the spectrotemporal features of their calls were analyzed compared to control groups during the first six weeks of life. While the vocal repertoire was not significantly affected by ACC lesions, the authors described notable differences in the social contact call, the phee call. Marmosets with ACC damage made fewer social contact calls, and when they did, these calls were shorter, louder, and monotonic. Additionally, the study revealed that ACC damage in infancy led to permanent alterations in downstream brain areas involved in social vocalizations, such as the amygdala and periaqueductal gray.

      Strengths:

      This study suggests that the ACC plays a crucial role in the normal development of social vocal behavior in infant marmosets. Studying vocal behavior in marmosets can provide insights into the neural mechanisms underlying human speech and communication disorders due to their similarity in brain structure and social behavior.

      The methods are robust and reliable with precise localization of the lesions with neuroimaging and histological examination.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. eLife Assessment

      This study investigates how the maintenance of a spatial location in working memory affects the representation of visual information in area V4 of monkeys. As such, it is important not only for understanding vision but also for determining how working memory impacts perceptual signals and their underlying circuits. The data provide convincing evidence of a direct communication between prefrontal circuits that store spatial information and V4, which, under the current experimental conditions, manifests mainly as changes in temporal activity patterns (oscillations).

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      - The logic of the experiment is nicely laid out.

      - The presentation is clear and concise.

      - The analyses are thorough, careful, and yield unambiguous results.

      - Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      The weaknesses I noted in the first round of reviews were effectively addressed by the authors. In particular, the expanded discussion on the overlapping effects of attention, working memory, and motor planning does a great job putting the current findings against the wider context concerning the neural mechanisms of visuomotor guidance.

      I think this is a well-designed and well-executed study that helps to better outline the relationship between perception and working memory given their respective neural substrates. A broad range of systems neuroscientists and experimental psychologists will find it illuminating.

    3. Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruiting neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signal to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights on the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, a few conceptual gaps make it harder for the reader to appreciate the mechanisms that lead to the observed results and evaluate whether and how these may apply to other cases of top-down control. The fact that the visual features under study were behaviorally irrelevant make it difficult to appreciate the relevance of the finding and its relation to top-down spatial attention mechanisms that involve similar/overlapping circuits. In the same vein, the use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF) etc. This could potentially change the conclusion and perspective.

      Moreover, encoding of the two visual features that are manipulated in the context of the study (contrast and orientation) seems to be affected differently in certain cases, which leaves a reader wondering about the source of this variability.

      Finally, although the study provides evidence in favor of a role of FEF in influencing phase coding of visual features in V4 in beta frequencies, important analysis that could have revealed the long-range mechanisms of such an effect including the analysis of intra-FEF and interareal (FEF-V4) neuronal interactions is missing from this paper

    4. Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to visual cortex that is used to alter neural activity, and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      - Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity<br /> - Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location<br /> - Convincing modeling efforts

      Comments on revisions:

      I have no further comments for the authors. The revised manuscript appears to have adequately addressed the substantial comments raised in the previous round of review. I especially appreciate the addition of a new supplementary figure analyzing the data when no background stimulus was presented.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. eLife Assessment

      This important study investigates how AD(H)D affects attention using neural and physiological measures in a Virtual Reality (VR) environment. Solid evidence is provided that individuals diagnosed with AD(H)D differ from control participants in both the encoding of the target sound and the encoding of acoustic interference. The VR paradigm here can potentially bridge lab experiments and real-life experiments. The study is of potential interests to researchers who are interested in auditory cognition, education, and ADHD.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study on AD(H)D. The authors combine a variety of neural and physiological metrics to study attention in a VR classroom setting. The manuscript is well written and the results are interesting, ranging from an effect of group (AD(H)D vs. control) on metrics such as envelope tracking, to multivariate regression analyses considering alpha-power, gaze, TRF, ERPs, and behaviour simultaneously. I find the first part of the results clear and strong. The multivariate analyses in Tables 1 and 2 are good ideas, but I think they would benefit from additional clarifications. Overall, I think that the methodological approach is useful in itself. The rest is interesting in that it informs us on which metrics are sensitive to group-effects and correlated with each other. I think this might be one interesting way forward. Indeed, much more work is needed to clarify how these results change with different stimuli and tasks. So, I see this as an interesting first step into more naturalistic measurement of speech attention.

      Strengths:

      I praise the authors for this interesting attempt to tackle a challenging topic with naturalistic experiment and metrics. I think the results broadly make sense and they contribute to a complex literature that is far from being linear and cohesive.

      Weaknesses:

      The authors have successfully addressed most of my concerns during the review process. Some weaknesses remain in this resubmission, but they do not make the results invalid. For example:<br /> - The EEG data was filtered twice, which is not recommended as that can introduce additional filtering artifacts. So, while I definitely do not recommend doing that, I do not expect that issue to have an impact on this specific result.<br /> - The authors did not check whether participants were somewhat familiar with the topics in the speech material. The authors agreed that this point might be beneficial for future research.<br /> - The hyperparameter tuning is consistent with previous work from the authors, and it involves selecting the optimal lambda value of the regularized regression based on the group average, thus choosing a single lambda value for all participants. In my opinion, that is not the optimal way to run those models, and I do not generally recommend using this approach. The reason is that the lambda can change depending on the magnitude of the signals and the SNR, leading to different optimal lambdas for distinct participants. On the other hand, finding those optimal lambda values for individual participants can be difficult depending on the amount of data and amount of noise, so it is sometimes necessary to apply strategies that ensure an appropriate choice of lambda. Using the group average as a metric for hyperparameter tuning produces a more stable metric and lambda value selection, which might be preferrable (even though this choice should not be taken lightly). In this specific case, I think the authors had a good reason to do so.

      Comments on revisions:

      The authors have done a great job at addressing my comments. I don't have any further concerns. Congratulations!

    3. Reviewer #2 (Public review):

      Summary:

      While selective attention is a crucial ability of human beings, previous studies on selective attention are primarily conducted in a strictly controlled context, leaving a notable gap in underlying the complexity and dynamic nature of selective attention in a naturalistic context. This issue is particularly important for classroom learning in individuals with ADHD, as selecting the target and ignoring the distractions are pretty difficult for them but are the pre-requirement of effective learning. The authors of this study have addressed this challenge using a well-motivated study. I believe the findings of this study will be a nice addition to the fields both cognitive neuroscience and educational neuroscience.

      Strengths:

      To achieve the purpose of setting up a naturalistic context, the authors have based their study on a novel Virtual Reality platform. This is clever as it is usually difficult to perform such a study in the real classroom. Moreover, various techniques such as brain imaging, eye-tracking and physiological measurement are combined to collect multi-level data. They found that, different from the controls, individuals with ADHD had higher neural responses to the irrelevant rather than the target sounds, reduced speech tracking of the teacher. Additionally, the power of alpha-oscillations and frequency of gaze-shifts away from the teacher are found to be associated with the ADHD symptoms. These results provide new insights into the mechanism of selective attention among ADHD populations.

      Weaknesses:

      It is worth noting that nowadays there has been some studies trying to do so in the real classroom, and thus the authors should acknowledge the difference between the virtual and real classroom context and foresee the potential future changes.<br /> The approach of combining multi-level data owns advantage to obtain reliable results, but also raises significant difficult for the readers to understand the main results.

      - An appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      As expected, individuals with ADHD showed anomalous pattern of neural responses, and eye-tracking pattern, compared to the controls. But there are also some similarities between groups such as amount of time paying attention to teachers, etc. In general, their conclusions are supported.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      The findings are an extension of previous efforts in understanding selective attention in the naturalistic context. The findings of this study are particularly helpful in inspiring teacher's practice and advancing the research of educational neuroscience. This study demonstrates, again, that it is important to understand the complexity of cognitive process in the naturalistic context.

      Comments on revisions:

      The authors have appropriately responded to my concerns. I do not have other comments. I do hope to see more data and results from the authors in future.

    4. Reviewer #3 (Public review):

      Summary:

      The authors conducted a well-designed experiment, incorporating VR classroom scenes and background sound events, with both control and ADHD participants. They employed multiple neurophysiological measures, such as EEG, eye movements, and skin conductance, to investigate the mechanistic underpinnings of paying attention in class and the disruptive effects of background noise.

      The results revealed that individuals with ADHD exhibited heightened sensory responses to irrelevant sounds and reduced tracking of the teacher's speech. Overall, this manuscript presented an ecologically valid paradigm for assessing neurophysiological responses in both control and ADHD groups. The analyses were comprehensive and clear, making the study potentially valuable for the application of detecting attentional deficits.

      Strengths:

      • The VR learning paradigm is well-designed and ecologically valid.

      • The neurophysiological metrics and analyses are comprehensive, and two physiological markers are identified capable of diagnosing ADHD.

      • The data shared could serve as a benchmark for future studies on attention deficits in ecologically valid scenarios.

      Weaknesses:

      • Several results are null results, i.e., no significant differences were found between ADHD and control populations.

      Comments on revisions:

      The authors have addressed all of my concerns with the original manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Line numbers are missing.

      Added

      (2) VR classroom. Was this a completely custom design based on Unity, or was this developed on top of some pre-existing code? Many aspects of the VR classroom scenario are only introduced (e.g., how was the lip-speech synchronisation done exactly?). Additional detail is required. Also, is or will the experiment code be shared publicly with appropriate documentation? It would also be useful to share brief example video-clips.

      We have added details about the VR classroom programming to the methods section (p. 6-7), and we have now included a video-example as supplementary material.

      “Development and programming of the VR classroom were done primarily in-house, using assets (avatars and environment) were sourced from pre-existing databases. The classroom environment was adapted from assets provided by Tirgames on TurboSquid (https://www.turbosquid.com/Search/Artists/Tirgames) and modified to meet the experimental needs. The avatars and their basic animations were sourced from the Mixamo library, which at the time of development supported legacy avatars with facial blendshapes (this functionality is no longer available in current versions of Mixamo). A brief video example of the VR classroom is available at: https://osf.io/rf6t8.

      “To achieve realistic lip-speech synchronization, the teacher’s lip movements were controlled by the temporal envelope of the speech, adjusting both timing and mouth size dynamically. His body motions were animated using natural talking gestures.”

      While we do intent to make the dataset publicly available for other researchers, at this point we are not making the code for the VR classroom public. However, we are happy to share it on an individual-basis with other researchers who might find it useful for their own research in the future.

      (3) "normalized to the same loudness level using the software Audacity". Please specify the Audacity function and parameters.

      We have added these details (p.7)

      “All sound-events were normalized to the same loudness level using the Normalize function in the audio-editing software Audacity (theaudacityteam.org, ver 3.4), with the peak amplitude parameter set to -5 dB, and trimmed to a duration of 300 milliseconds.“

      (4) Did the authors check if the participants were already familiar with some of the content in the mini-lectures?

      This is a good point. Since the mini-lectures spanned many different topics, we did not pre-screen participants for familiarity with the topics, and it is possible that some of the participants had some pre-existing knowledge.

      In hindsight, it would have been good to have added some reflective questions regarding participants prior knowledge as well as other questions such as level of interest in the topic and/or how well they understood the content. These are elements that we hope to include in future versions of the VR classroom.

      (5) "Independent Component Analysis (ICA) was then used to further remove components associated with horizontal or vertical eye movements and heartbeats". Please specify how this selection was carried out.

      Selection of ICA components was done manually based on visual inspection of their time-course patterns and topographical distributions, to identify components characteristic of blinks, horizontal eye-movements and heartbeats). Examples of these distinct components are provided in Author response image 1 below. These is now specified in the methods section.

      Author response image 1.

      (6) "EEG data was further bandpass filtered between 0.8 and 20 Hz". If I understand correctly, the data was filtered a second time. If that's the case, please do not do that, as that will introduce additional and unnecessary filtering artifacts. Instead, the authors should replace the original filter with this one (so, filtering the data only once). Please see de Cheveigne and Nelkn, Neuron, 2019 for an explanation. Also, please provide an explanation of the rationale for further restricting the cut-off bands in the methods section. Finally, further details on the filters should be included (filter type and order, for example).

      Yes, the data was indeed filtered twice. The first filter is done as part of the preprocessing procedure, in order to remove extremely high- and low- frequency noise but retain most activity within the range of “neural” activity. This broad range is mostly important for the ICA procedure, so as to adequately separate between ocular and neural contribution to the recorded signal.

      However, since both the speech tracking responses and ERPs are typically less broadband and are comprised mostly of lower frequencies (e.g., those that make up the speech-envelope), a second narrower filter was applied to improve TRF model-fit and make ERPs more interpretable.

      In both cases we used a fourth order zero-phase Butterworth IIR filter with 1-seconds of padding, as implemented in the Fieldtrip toolbox. We have added these details to the manuscript.

      (7) "(~ 5 minutes of data in total), which is insufficient for deriving reliable TRFs". That is a bit pessimistic and vague. What does "reliable" mean? I would tend to agree when talking about individual subject TRFs, which 5 min per participant can be enough at the group level. Also, this depends on the specific speech material. If the features are univariate or multivariate. Etc. Please narrow down and clarify this statement.

      We determined that the data in the Quiet condition (~5 min) was insufficient for performing reliable TRF analysis, by assessing whether its predictive-power was significantly better than chance. As shown in Author response image 2 below, the predictive power achieved using this data was not higher than values obtained in permuted data (p = 0.43). Therefore, we did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      Author response image 2.

      (8) "Based on previous research in by our group (Kaufman & Zion Golumbic 2023), we chose to use a constant regularization ridge parameter (λ= 100) for all participants and conditions". This is an insufficient explanation. I understand that there is a previous paper involved. However, such an unconventional choice that goes against the original definition and typical use of these methods should be clearly reported in this manuscript.

      We apologize for not clarifying this point sufficiently, and have added an explanation of this methodological choice (p.11):

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Assuming that the explanation will be sufficiently convincing, which is not a trivial case to make, the next issue that I will bring up is that the lambda value depends on the magnitude of input and output vectors. While the input features are normalised, I don't see that described for the EEG signals. So I assume they are not normalized. In that case, the lambda would have at least to be adapted between subjects to account for their different magnitude.

      We apologize for omitting this detail – yes, the EEG signals were normalized prior to conducting the TRF analysis. We have updated the methods section to explicitly state this pre-processing step (p.10).

      Another clarification, is that value (i.e., 100) would not be comparable either across subjects or across studies. But maybe the authors have a simple explanation for that choice? (note that this point is very important as this could lead others to use TRF methods in an inappropriate way - but I understand that the authors might have specific reasons to do so here). Note that, if the issue is finding a reliable lambda per subject, a more reasonable choice would be to use a fixed lambda selected on a generic (i.e., group-level) model. However selecting an arbitrary lambda could be problematic (e.g., would the results replicate with another lambda; and similarly, what if a different EEG system was used, with different overall magnitude, hence the different impact of the regularisation).

      We fully agree that selecting an arbitrary lambda is problematic (esp across studies). As clarified above, the group-level lambda chosen here for the encoding more was data-driven, optimized based on group-level predictive power.

      (9) "L2 regularization of the model, to reduce its complexity". Could the authors explain what "reduce its complexity" refers to?

      Our intension here was to state that the L2 regularization constrains the model’s weights so that it can better generalize between to left-out data. However, for clarity we have now removed this statement.

      (10) The same lambda value was used for the decoding model. From personal experience, that is very unlikely to be the optimal selection. Decoding models typically require a different (usually larger) lambda than forward models, which can be due to different reasons (different SNR of "input" of the model and, crucially, very different dimensionality).

      We agree with the reviewer that treatment of regularization parameters might not be identical for encoding and decoding models. Our initial search of lambda parameters was limited to λ= 0.01 - 100, with λ= 100 showing the best reconstruction correlations. However, following the reviewer’s suggestion we have now broadened the range and found that, in fact reconstruction correlations are further improved and the best lambda is λ= 1000 (see Author response image 3 below, left panel). Importantly, the difference in decoding reconstruction correlations between the groups is maintained regardless of the choice of lambda (although the effect-size varies; see Author response image 3, right panel). We have now updated the text to reflect results of the model with λ= 1000.

      Author response image 3.

      (11) Skin conductance analysis. Additional details are required. For example, how was the linear interpolation done exactly? The raw data was downsampled, sure. But was an anti-aliasing filter applied? What filter exactly? What implementation for the CDA was run exactly?

      We have added the following details to the methods section (p. 14):

      “The Skin Conductance (SC) signal was analyzed using the Ledalab MATLAB toolbox (version 3.4.9; Benedek and Kaernbach, 2010; http://www.ledalab.de/) and custom-written scripts. The raw data was downsampled to 16Hz using FieldTrip's ft_resampledata function, which applies a built-in anti-aliasing low-pass filter to prevent aliasing artifacts. Data were inspected manually for any noticeable artifacts (large ‘jumps’), and if present were corrected using linear interpolation in Ledalab. A continuous decomposition analysis (CDA) was employed to separate the tonic and phasic SC responses for each participant. The CDA was conducted using the 'sdeco' mode (signal decomposition), which iteratively optimizes the separation of tonic and phasic components using the default regularization settings.”

      (12) "N1- and P2 peaks of the speech tracking response". Have the authors considered using the N1-P2 complex rather than the two peaks separately? Just a thought.

      This is an interesting suggestion, and we know that this has been used sometimes in more traditional ERP literature. In this case, since neither peak was modulated across groups, we did not think this would yield different results. However, it is a good point to keep in mind for future work.

      (13) Figure 4B. The ticks are missing. From what I can see (but it's hard without the ticks), the N1 seems later than in other speech-EEG tracking experiments (where is closer to ~80ms). Could the authors comment on that? Or maybe this looks similar to some of the authors' previous work?

      We apologize for this and have added ticks to the figure.

      In terms of time-course, a N1 peak at around 100ms is compatible with many of our previous studies, as well as those from other groups.

      (14) Figure 4C. Strange thin vertical grey bar to remove.

      Fixed.

      (15) Figure 4B: What about the topographies for the TRF weights? Could the authors show that for the main components?

      Yes. The topographies of the main TRF components are similar to those of the predictive power and are compatible with auditory responses. We have added them to Figure 4B.

      (16) Figure 4B: I just noticed that this is a grand average TRF. That is ok (but not ideal) only because the referencing is to the mastoids. The more appropriate way of doing this is to look at the GFP, instead, which estimates the presence of dipoles. And then look at topographies of the components. Averaging across channels makes the plotted TRF weaker and noisier. I suggest adding the GFP to the plot. Also, the colour scale in Figure 4A is deceiving, as blue is usually used for +/- in plots of the weights. While that is a heatmap, where using a single colour or even yellow to red would be less deceiving at first look. Only cosmetics, indeed. The result is interesting nonetheless!

      We apologize for this, and agree with the reviewer that it is better not to average across EEG channels. In the revised Figure, we now show the TRFs based on the average of electrodes FC1, FC2, and FCz, which exhibited the strongest activity for the two main components.

      Following the previous comment, we have also included the topographical representation of the TRF main components, to give readers a whole-head perspective of the TRF.

      We have also fixed the color-scales.

      We are glad that the reviewer finds this result interesting!

      (17) Figure 4C. This looks like a missed opportunity. That metric shows a significant difference overall. But is that underpinned but a generally lower envelope reconstruction correlation, or by a larger deviation in those correlations (so, that metric is as for the control in some moments, but it drops more frequently due to distractibility)?

      We understand the reviewer’s point here, and ideally would like to be able to address this in a more fine-grained analysis, for example on a trial-by-trial basis. However, the design of the current experiment was not optimized for this, in terms of (for example) number of trials, the distribution of sound-events and behavioral outcomes. We hope to be able to address this issue in our future research.

      (18) I am not a fan of the term "accuracy" for indicating envelope reconstruction correlations. Accuracy is a term typically associated with classification. Regression models are typically measured through errors, loss, and sometimes correlations. 'Accuracy' is inaccurate (no joke intended).

      We accept this comment and now used the term “reconstruction correlation”.

      (19) Discussion. "The most robust finding in". I suggest using more precise terminology. For example, "largest effect-size".

      We agree and have changed the terminology (p. 31).

      (20) "individuals who exhibited higher alpha-power [...]". I probably missed this. But could the authors clarify this result? From what I can see, alpha did not show an effect on the group. Is this referring to Table 2? Could the authors elaborate on that? How does that reconcile with the non-significant effect of the group? In that same sentence, do you mean "and were more likely"? If that's the case, and they were more likely to report attentional difficulties, how is it that there is no group-effect when studying alpha?

      Yes, this sentence refers to the linear regression models described in Figure 10 and in Table 2. As the reviewer correctly points out, this is one place where there is a discrepancy between the results of the between-group analysis (ADHD diagnosis yes/no) and the regression analysis, which treats ADHD symptoms as a continuum, across both groups. The same is true for the gaze-shift data, which also did not show a significance between-group effect but was identified in the regression analysis as contributing to explaining the variance in ADHD symptoms.

      We discuss this point on pages 30-31, noting that “although the two groups are clearly separable from each other, they are far from uniform in the severity of symptoms experienced”, which motivated the inclusion of both analyses in this paper.

      At the bottom of p. 31 we specifically address the similarities and differences between the between-group and regression-based results. In our opinion, this pattern emphasizes that while neither approach is ‘conclusive’, looking at the data through both lenses contributes to an overall better understanding of the contributing factors, as well as highlighting that “no single neurophysiological measure alone is sufficient for explaining differences between the individuals – whether through the lens of clinical diagnosis or through report of symptoms”.

      (21) "why in the latter case the neural speech-decoding accuracy did not contribute to explaining ASRS scores [...]". My previous point 1 on separating overall envelope decoding from its deviation could help there. The envelope decoding correlation might go up and down due to SNR, while you might be more interested in the dynamics over time (i.e., looking at the reconstructions over time).

      Again, we appreciate this comment, but believe that this additional analysis is outside the scope of what would be reliably-feasible with the current dataset. However, since the data will be made publicly available, perhaps other researchers will have better ideas as to how to do this.

      (22) Data and code sharing should be discussed. Also, specific links/names and version numbers should be included for the various libraries used.

      We are currently working on organizing the data to make it publicly available on the Open Science Project.

      We have updated links and version numbers for the various toolboxes/software used, throughout the manuscript.

      Reviewer #2:

      (1) While it is highly appreciated to study selective attention in a naturalistic context, the readers would expect to see whether there are any potential similarities or differences in the cognitive and neural mechanisms between contexts. Whether the classic findings about selective attention would be challenged, rebutted, or confirmed? Whether we should expect any novel findings in such a novel context? Moreover, there are some studies on selective attention in the naturalistic context though not in the classroom, it would be better to formulate specific hypotheses based on previous findings both in the strictly controlled and naturalistic contexts.

      Yes, we fully agree that comparing results across different contexts would be extremely beneficial and important.

      The current paper serves as an important proof-first-concept demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility, but is also the basis for formulating specific hypothesis that will be tested in follow-up studies.

      If fact, a follow up study is already ongoing in our lab, where we are looking into this point, by testing users in different VR scenarios (e.g., classroom, café, office etc.), and assessing whether similar neurophysiological patterns are observed across contexts and to what degree they are replicable within and across individuals. We hope to share these data with the community in the near future.

      (2) Previous studies suggest handedness and hemispheric dominance might impact the processing of information in each hemisphere. Whether these issues have been taken into consideration and appropriately addressed?

      This is an interesting point. In this study we did not specifically control for handedness/hemispheric dominance, since most of the neurophysiological measured used here are sensory/auditory in their nature, and therefore potentially invariant to handedness. Moreover, the EEG signal is typically not very sensitive to hemispheric dominance, at least for the measures used here. However, this might be something to consider more explicitly in future studies. Nonetheless, we have added handedness information to the Methods section (p. 5): “46 right-handed, 3 left-handed”

      (3) It would be interesting to know how students felt about the Virtual Classroom context, whether it is indeed close to the real classroom or to some extent different.

      Yes, we agree. Obviously, the VR classroom differs in many ways from a real classroom, in terms of the perceptual experience, social aspects and interactive possibilities. We did ask participants about their VR experience after the experiment, and most reported feeling highly immersed in the VR environment and engaged in the task, with a strong sense of presence in the virtual-classroom.

      We note that, in parallel to the VR studies in our lab, we are also conducting experiments in real classrooms, and we hope that the cross-study comparison will be able to shed more light on these similarities/differences.

      (4) One intriguing issue is whether neural tracking of the teacher's speech can index students' attention, as the tracking of speech may be relevant to various factors such as sound processing without semantic access.

      Another excellent point. While separating the ‘acoustic’ and ‘semantic’ contributions to the speech tracking response is non-trivial, we are currently working on methodological approaches to do this (again, in future studies) following, for example, the hierarchical TRF approach used by Brodbeck et al. and others.

      (5) There are many results associated with various metrics, and many results did not show a significant difference between the ADHD group and the control group. It is difficult to find the crucial information that supports the conclusion. I suggest the authors reorganize the results section and report the significant results first, and to which comparison(s) the readers should pay attention.

      We apologize if the organization of the results section was difficult to follow. This is indeed a challenge when collecting so many different neurophysiological metrics.

      To facilitate this, we have now added a paragraph at the beginning of the result section, clarifying its structure (p.16):

      The current dataset is extremely rich, consisting of many different behavioral, neural and physiological responses. In reporting these results, we have separated between metrics that are associated with paying attention to the teacher (behavioral performance, neural tracking of the teacher’s speech, and looking at the teacher), those capturing responses to the irrelevant sound-events (ERPs and event-related changes in SC and gaze); as well as more global neurophysiological measures that may be associated with the listeners’ overall ‘state’ of attention or arousal (alpha- and beta-power and tonic SC).

      Moreover, within each section we have ordered the analysis such that the ones with significant effects are first. We hope that this contributes to the clarity of the results section.

      (6) The difference between artificial and non-verbal humans should be introduced earlier in the introduction and let the readers know what should be expected and why.

      We have added this to the Introduction (p. 4)

      (7) It would be better to discuss the results against a theoretical background rather than majorly focusing on technical aspects.

      We appreciate this comment. In our opinion, the discussion does contain a substantial theoretical component, both regarding theories of attention and attention-deficits, and also regarding their potential neural correlates. However, we agree that there is always room for more in depth discussion.

      Reviewer #3:

      Major:

      (1) While the study introduced a well-designed experiment with comprehensive physiological measures and thorough analyses, the key insights derived from the experiment are unclear. For example, does the high ecological validity provide a more sensitive biomarker or a new physiological measure of attention deficit compared to previous studies? Or does the study shed light on new mechanisms of attention deficit, such as the simultaneous presence of inattention and distraction (as mentioned in the Conclusion)? The authors should clearly articulate their contributions.

      Thanks for this comment.

      We would not say that this paper is able to provide a ‘more sensitive biomarker’ or a ‘new physiological measure of attention’ – in order to make those type of grand statements we would need to have much more converging evidence from multiple studies and using both replication and generalization approaches.

      Rather, from our perspective, the key contribution of this work is in broadening the scope of research regarding the neurophysiological mechanisms involved in attention and distraction.

      Specifically, this work:

      (1) Offers a significant methodological advancement of the field – demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility in contexts that ‘mimic’ real-life situations (rather than highly controlled computerized tasks).

      (2) Provides a solid basis formulating specific mechanistic hypothesis regarding the neurophysiological metrics associated with attention and distraction, the interplay between them, and their potential relation to ADHD-symptoms. Rather than being an end-point, we see these results as a start-point for future studies that emphasize ecological validity and generalizability across contexts, that will hopefully lead to improved mechanisms understanding and potential biomarkers of real-life attentional capabilities (see also response to Rev #2 comment #1 above).

      (3) Highlights differences and similarities between the current results and those obtained in traditional ‘highly controlled’ studies of attention (e.g., in the way ERPs to sound-events differ between ADHD and controls; variability in gaze and alpha-power; and more broadly about whether ADHD symptoms do or don’t map onto specific neurophysiological metrics). Again, we do not claim to give a definitive ’answer’ to these issues, but rather to provide a new type of data that can expands the conversation and address the ecological validity gap in attention research.

      (2) Based on the multivariate analyses, ASRS scores correlate better with the physiological measures rather than the binary deficit category. It may be worthwhile to report the correlation between physiological measures and ASRS scores for the univariate analyses. Additionally, the correlation between physiological measures and behavioral accuracy might also be interesting.

      Thanks for this. The beta-values reported for the regression analysis reflect the correlations between the different physiological measures and the ASRS scores (p. 30). From a statistical perspective, it is better to report these values rather than the univariate correlation-coefficients, since these represent the ‘unique’ relationship with each factor, after controlling for all the others.

      The univariate correlations between the physiological measures themselves, as well as with behavioral accuracy, are reported in Figure 10

      (3) For the TRF and decoding analysis, the authors used a constant regularization parameter per a previous study. However, the optimal regularization parameter is data-dependent and may differ between encoding and decoding analyses. Furthermore, the authors did not conduct TRF analysis for the quiet condition due to the limited ~5 minutes of data. However, such a data duration is generally sufficient to derive a stable TRF with significant predictive power (Mesik and Wojtczak, 2023).

      The reviewer raises two important points, also raised by Rev #1 (see above).

      Regarding the choice of regularization parameters, we have now clarified that although we used a common lambda value for all participants, it was selected in a data-driven manner, so as to achieve an optimal predictive power at the group-level.

      See revised methods section:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Regarding whether data was sufficient in the Quiet condition for performing TRF analysis – we are aware of the important work by Mesik & Wojtczak, and had initially used this estimate when designing our study. However, when assessing the predictive-power of the TRF model trained on data from the Quiet condition, we found that it was not significantly better than chance (see Author response image 2, ‘real’ predictive power vs. permuted data). Therefore, we ultimately did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      (4) As shown in Figure 4, for ADHD participants, decoding accuracy appears to be lower than the predictive power of TRF. This result is surprising because more data (i.e., data from all electrodes) is used in the decoding analysis.

      This is an interesting point – however, in our experience it is not necessarily the case that decoding accuracy (i.e., reconstruction correlation with the stimulus) is higher than encoding predictive-power. While both metrics use Pearson’s’ correlations, they quantify the similarity between two different types of signals (the EEG and the speech-envelope). Although the decoding procedure does use data from all electrodes, many of them don’t actually contain meaningful information regarding the stimulus, and thus could just as well hinder the overall performance of the decoding.

      (5) Beyond the current analyses, the authors may consider analyzing inter-subject correlation, especially for the gaze signal analysis. Given that the area of interest during the lesson changes dynamically, the teacher might not always be the focal point. Therefore, the correlation of gaze locations between subjects might be better than the percentage of gaze duration on the teacher.

      Thanks for this suggestion. We have tried to look into this, however working with eye-gaze in a 3-D space is extremely complex and we are not able to calculate reliable correlations between participants.

      (6) Some preprocessing steps relied on visual and subjective inspection. For instance, " Visual inspection was performed to identify and remove gross artifacts (excluding eye movements) " (P9); " The raw data was downsampled to 16Hz and inspected for any noticeable artifacts " (P13). Please consider using objective processes or provide standards for subjective inspections.

      We are aware of the possible differences between objective methods of artifact rejection vs. use of manual visual inspection, however we still prefer the manual (subjective) approach. As noted, in this case only very large artifacts were removed, exceeding ~ 4 SD of the amplitude variability, so as to preserve as many full-length trials as possible.

      (7) Numerous significance testing methods were employed in the manuscript. While I appreciate the detailed information provided, describing these methods in a separate section within the Methods would be more general and clearer. Additionally, the authors may consider using a linear mixed-effects model, which is more widely adopted in current neuroscience studies and can account for random subject effects.

      Indeed, there are many statistical tests in the paper, given the diverse types of neurophysiological data collected here. We actually thought that describing the statistics per method rather than in a separate “general” section would be easier to follow, but we understand that readers might diverge in their preferences.

      Regarding the use of mixed-effect models – this is indeed a great approach. However, it requires deriving reliable metrics on a per-trial basis, and while this might be plausible for some of our metrics, the EEG and GSR metrics are less reliable at this level. This is why we ultimately chose to aggregate across trials and use a regular regression model rather than mixed-effects.

      (8) Some participant information is missing, such as their academic majors. Given that only two lesson topics were used, the participants' majors may be a relevant factor.

      To clarify – the mini-lectures presented here actually covered a large variety of topics, broadly falling within the domains of history, science and social-science and technology. Regarding participants’ academic majors, these were relatively diverse, as can be seen in Author response table 1 and Author response image 4.

      Author response table 1.

      Author response image 4.

      (9) Did the multiple regression model include cross-validation? Please provide details regarding this.

      Yes, we used a leave-one-out cross validation procedure. We have now clarified this in the methods section which now reads:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Minor:

      (10) Typographical errors: P5, "forty-nine 49 participants"; P21, "$ref"; P26, "Table X"; P4, please provide the full name for "SC" when first mentioned.

      Thanks! corrected

    1. eLife Assessment

      In this useful study, the authors perform voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors conclude that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, namely theta and ripples. However, evidence for the claims in the paper remains incomplete, due to caveats of the experimental approach and claims that are based on a relatively sparse data set collected with a cutting-edge but still largely untested method.

    2. Reviewer #1 (Public review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings.

      Comments on revisions: I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population level activity in CA1.

      Comments on revisions:

      I have no further major requests and thank the authors for the additional data and analyses.

    4. Reviewer #3 (Public review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head fixed mice running on a track while local field potential (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected the other side of the brain.

      Strengths:

      The authors use a cutting-edge technique.

      Weaknesses:

      Although the authors have toned down their claims, the statement in the title ("Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Theta but not Ripple Oscillations During Novel Exploration") is still unsupported.

      One could write the same title while voltage imaging one mouse and recording LFP from another mouse.

      To properly convey the results, the title should be modified to read "Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Contralateral Theta but not with Contralateral Ripple Oscillations During Novel Exploration"

      Without making this change, the title - and therefore the entire work - is misleading at best.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-ofthe-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings.

      We thank the reviewer for a thorough review of our manuscript and for recognizing the strength of our study.

      Reviewer #2 (Public review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allow single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for a thorough review of our manuscript and for recognizing the strength of our study.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty is they were included.

      (3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement. (4) The authors mention in the discussion that they image deep layer PCs in CA1, however this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer specific gene to support this.

      Comments on revisions:

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected the other side of the brain.

      Strengths:

      The authors use a cutting-edge technique.

      We thank the reviewer for a thoughtful review of our manuscript and for pointing out the technical strength of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples. The main problem with the work is that the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both rhythms exhibit profound differences as a function of location.

      Theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. Because the LFP was recorded using a single-contact tungsten electrode, there is no way to know whether the electrode was exactly in the CA1 pyramidal cell layer, or in the CA1 oriens, CA1 radiatum, or perhaps even CA3 - which exhibits ripples and theta which are weakly correlated and in anti-phase with the CA1 rhythms, respectively. Thus, there is no way to know whether the theta phase used in the analysis is the phase of the local CA1 theta.

      Although the occurrence of CA1 ripples is often correlated across parts of the hippocampus, ripples are inherently a locally-generated rhythm. Independent ripples occur within a fraction of a millimeter within the same hemisphere. Ripples are also very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident. Thus, even if the LFP was recorded from the center of the CA1 pyramidal layer in the contralateral hemisphere, it would not suffice for the claim made in the title.

      We thank the reviewer for pointing out the issue regarding the claim made in the title. We have revised the manuscript to clarify that the theta and ripple oscillations referenced in the title refer to specific frequency bands of intracellular and contralaterally recorded field potentials rather than field potentials recorded at the same site as the neuronal activity.

      Abstract (line19):

      “… Notably, these synchronous ensembles were not associated with contralateral ripple oscillations but were instead phase-locked to theta waves recorded in the contralateral CA1 region. Moreover, the subthreshold membrane potentials of neurons exhibited coherent intracellular theta oscillations with a depolarizing peak at the moment of synchrony.”

      Introduction (line68):

      “… Surprisingly, these synchronous ensembles occurred outside of contralateral ripples and were phase-locked to intracellular theta oscillations as well as extracellular theta oscillations recorded from the contralateral CA1 region.”

      To address concerns about electrode placement, we have now included posthoc histological verification of electrode locations, confirming that they were positioned in the contralateral CA1 pyramidal layer (Author response image 1). 

      Author response image 1.

      Post-hoc histological section showing the location of a DiI-coated electrode in the contralateral CA1 pyramidal layer. Scale bar: 300 μm.

      While we appreciate that theta and ripple oscillations exhibit regional variations in phase and amplitude, previous studies have demonstrated a strong co-occurrence and synchrony of these oscillations between both hippocampi1-3. Given that our primary objective was to examine how neuronal ensembles relate to large-scale hippocampal oscillation states rather than local microcircuit-level fluctuations, we recorded theta and ripple oscillations from the contralateral CA1 region.

      However, we acknowledge that contralateral recordings do not capture all ipsilateral-specific dynamics. Theta phases vary with depth and precise location, and local ripple events may be independently generated across small spatial scales. To reflect this, we have now explicitly acknowledged these considerations in the discussion. 

      Discussion (line527):

      While contralateral LFP recordings reliably capture large-scale hippocampal theta and ripple oscillations, they may not fully account for ipsilateral-specific dynamics, such as variations in theta phase alignment or locally generated ripple events. Although contralateral recordings serve as a well-established proxy for large-scale hippocampal oscillatory states, incorporating simultaneous ipsilateral field potential recordings in future studies could refine our understanding of local-global network interactions. Despite these considerations, our findings provide robust evidence for the existence of synchronous neuronal ensembles and their role in coordinating newly formed place cells. These results advance our understanding of how synchronous neuronal ensembles contribute to spatial memory acquisition and hippocampal network coordination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have provided sufficient experimental and analytical data addressing my comments, particularly regarding consistency with past electrophysiological data and the exclusion of potential imaging artifacts.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Minor comment: In Figure 2C and Figure 5-figure supplement 1, 'paired Student's t-test' is not entirely appropriate. More precisely, either 'paired t-test' or 'Student's t-test' would better indicate the correct statistical method. Please verify whether these data comparisons are within-group or between-group.

      Thank you for the comment. We have revised the manuscript as suggested.

      Reviewer #2 (Recommendations for the authors):

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing our efforts in revising the manuscript.

      Minor points- line 169- typo, correct grant to grand

      Thank you for pointing it out. The typo has been corrected.

      (1) Buzsaki, G. et al. Hippocampal network patterns of activity in the mouse. Neuroscience 116, 201-211 (2003). https://doi.org:10.1016/s03064522(02)00669-3

      (2) Szabo, G. G. et al. Ripple-selective GABAergic projection cells in the hippocampus. Neuron 110, 1959-1977 e1959 (2022). https://doi.org:10.1016/j.neuron.2022.04.002

      (3) Huang, Y. C. et al. Dynamic assemblies of parvalbumin interneurons in brain oscillations. Neuron 112, 2600-2613 e2605 (2024). https://doi.org:10.1016/j.neuron.2024.05.015

    1. eLife Assessment

      The authors present a biologically plausible framework for action selection and learning in the striatum that is a fundamental advance in our understanding of possible neural implementations of reinforcement learning in the basal ganglia. They provide compelling evidence that their model can reconcile realistic neural plasticity rules with the distinct functional roles of the direct and indirect spiny projection neurons of the striatum, recapitulating experimental findings regarding the activity profiles of these distinct neural populations and explaining a key aspect of striatal function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations in mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motive the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution in the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      Weaknesses:

      A weakness in this work is that it isn't clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

    3. Reviewer #2 (Public review):

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error which modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allows them to implement reinforcement learning models. It proposes that two distinct component of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mouse.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      Comments on revisions:

      I thank the authors for their response to public and private reviews and for the clarifications and changes to the manuscript which have strengthened it. I understand the inability to implement some of the proposed additional simulation due to authors having left academia and the request for a version of record.

    4. Reviewer #3 (Public review):

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors, and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action values functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors proposes a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      Weaknesses:

      (1) It is not correct to call the action value learning using the externally-selected action as "off-policy." Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforce learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).<br /> It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743

      (2) It is also confusing to contract TD learning and Q-learning, as the latter is considered as on type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action a_{t-1} implicitly in r_t and s_t based on the reward and state transition function.

      (3) It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      (4) Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions:<br /> Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154

      (5) In the paragraph before Eq. (3), Eq (1) should be Eq. (2) for the iSPN.

      Here are comments back to the authors' replies with the revised version:

      (1) I do not agree on the use of inaccurate technical terms. On-policy does not require that the policy is greedy with respect to the actions values, as authors seem to assume here.

      In fact, the policy (10) is just a standard soft-max action selection based on the action values by the difference of dSPN and iSPN outputs.

      Furthermore, in the immediate reward setting tested in this paper, action values are independent of the policy, so there is no distinction between on-policy vs. off-policy. This is also apparent from the "TD" errors in (19) and (21), where there is no TD.

      (2) To really compare the different forms of TD, multi-step RL tasks should be used.

      (3) This fundamental limitation should be explicitly documented in the manuscript. This is not just the same as any RL algorithms. Having two action representations within each action step make temporal credit assignment more difficult.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations of mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motivate the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution of the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, but this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      We thank the reviewer for the positive comments.

      Weaknesses:

      A weakness in this work is that it isn’t clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

      We agree that the biological substrate of the efference copy remains a key open question. We discuss potential pathways in the Discussion section of our manuscript and hope that future experimental studies clarify the question.

      Reviewer #2:

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error that modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allow them to implement reinforcement learning models. It proposes that two distinct components of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mice.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      We thank the reviewer for the positive comments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations.

      In addition to the descriptions in the Methods, we have provided code implementing the key features of our simulations, which should contribute to reproducibility of our results.

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      We have clarified in the Discussion section “Biological substrates of striatal efferent inputs” that these represent assumptions or predictions that have not yet been demonstrated experimentally.

      Reviewer #3:

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action-value functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors propose a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not correct to call the action value learning using the externally-selected action as “offpolicy.” Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforcement learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).

      It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743.

      We regret that we do not completely follow the reviewer’s comment. We use “off-policy” to refer to the fact that, considered in isolation, the basal ganglia reinforcement learning system that we model learns a target policy that may be distinct from the behavioral policy of the organism as a whole.

      It is also confusing to contract TD learning and Q-learning, as the latter is considered as one type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action at−1 implicitly in rt and st based on the reward and state transition function.

      We agree that this was confusing. We have therefore changed the places in our paper where we intended to refer to “TD learning of a value function V (s)” to specifically mention V (s), rather than just “TD learning.”

      It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      The non-interference arises from the orthogonality of the difference (action selection) and sum (efference copy) modes, as described in Figure 3. However, we agree with the reviewer that the problem of temporal credit assignment, when many actions are taken before reward feedback is obtained, is present in our model, as in any standard RL model.

      Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions: Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154.

      We are grateful for the interesting suggestion and reference, which we have added to the manuscript. However, we note that the issue of delayed reward feedback may also be partially addressed by using a sufficiently long eligibility trace.

      In the paragraph before Eq. (3), Eq. (1) should be Eq. (2) for the iSPN.

      Corrected.

    1. eLife Assessment

      The submission by Gopal and colleagues reports important findings describing the structure of genetic and colour variation in its native range for the globally invasive weed Lantana camara. Whilst the importance of the research question and the scale of the sampling is appreciated, the analysis, which is currently incomplete, requires further tests to support the claims made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses. Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation. Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested. Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

    1. eLife assessment

      Using an elegant and thorough experimental design, Thomazeau et al show that, in the developing mouse visual cortex, presynaptic NMDA receptors at layer 5 neocortical synapses mediate spike-timing dependent LTD via JNK2, non-ionotropic signaling. These fundamental findings shed light on how NMDA receptors can tune synaptic function without acting as coincidence detectors. The experiments are supported by compelling evidence, gathered through optogenetics and quadruple patch clamp recordings from cortical slices.

    2. Reviewer #1 (Public review):

      Summary:

      The results offer compelling evidence that L5-L5 tLTD depends on presynaptic NMDARs, a concept that has previously been somewhat controversial.

      It documents the novel finding that presynaptic NMDARs facilitate tLTD through their metabotropic signaling mechanism.

      Strengths:

      The experimental design is clever and clean.

      The approach of comparing the results in cell pairs where NMDA is deleted either presynaptically or postsynaptically is technically insightful and yields decisive data.

      The MK801 experiments are also compelling.

      Weaknesses:

      No major weaknesses were noted by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      The study characterized the dependence of spike-timing-dependent long-term depression (tLTD) on presynaptic NMDA receptors and the intracellular cascade after NMDAR activation possibly involved in the observed decrease in glutamate probability release at L5-L5 synapses of the visual cortex in mouse brain slices.

      Strengths:

      The genetic and electrophysiological experiments are thorough. The experiments are well-reported and mainly support the conclusions. This study confirms and extends current knowledge by elucidating additional plasticity mechanisms at cortical synapses, complementing existing literature.

      Weaknesses:

      While one of the main conclusions (preNMDARs mediating presynaptic LTD) is resolved in a very convincing genetic approach, the second main conclusion of the manuscript (non-ionotropic preNMDARs) relies on the use of a high concentration of extracellular blockers (MK801, 2 mM; 7-clorokinurenic acid: 100 microM), but no controls for the specific actions of these compounds are shown. In addition, no direct testing for ions passing through preNMDAR has been performed.

      It is not known if the results can be extrapolated to adult brain as the data were obtained from 11-18 days-old mice slices, a period during which synapses are still maturing and the cortex is highly plastic.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, "Neocortical Layer-5 tLTD Relies on Non-Ionotropic Presynaptic NMDA Receptor Signaling", Thomazeau et al. seek to determine the role of presynaptic NMDA receptors and the mechanism by which they mediate expression of frequency-independent timing-dependent long-term depression (tLTD) between layer-5 (L5) pyramidal cells (PCs) in the developing mouse visual cortex. By utilizing sophisticated methods, including sparse Cre-dependent deletion of GluN1 subunit via neonatal iCre-encoding viral injection, in vitro quadruple patch clamp recordings, and pharmacological interventions, the authors elegantly show that L5 PC->PC tLTD is (1) dependent on presynaptic NMDA receptors, (2) mediated by non-ionotropic NMDA receptor signaling, and (3) is reliant on JNK2/Syntaxin-1a (STX1a) interaction (but not RIM1αβ) in the presynaptic neuron. The study elegantly and pointedly addresses a long-standing conundrum regarding the lack of frequency dependence of tLTD.

      Strengths:

      The authors did a commendable job presenting a very polished piece of work with high-quality data that this Reviewer feels enthusiastic about. The manuscript has several notable strengths. Firstly, the methodological approach used in the study is highly sophisticated and technically challenging and successfully produced high-quality data that were easily accessible to a broader audience. Secondly, the pharmacological interventions used in the study targeted specific players and their mechanistic roles, unveiling the mechanism in question step-by-step. Lastly, the manuscript is written in a well-organized manner that is easy to follow. Overall, the study provides a series of compelling evidence that leads to a clear illustration of mechanistic understanding.

      I have a couple of small items below, which the authors can address in a minor revision if they so wish.

      Minor comments:

      (1) For the broad readership, a brief description of JNK2-mediated signaling cascade underlying tLTD, including its intersection with CB1 receptor signaling may be desired.

      (2) The authors used juvenile mice, P11 to P18 of age. It is a typical age range used for plasticity experiments, but it is also true that this age range spans before and after eye-opening in mice (~P13) and is a few days before the onset of the classical critical period for ocular dominance plasticity in the visual cortex. Given the mechanistic novelty reported in the study, can authors comment on whether this signaling pathway may be age-dependent?

    1. eLife Assessment

      This valuable work provides a robust yet simple protocol to isolate small extracellular vesicles from small volumes of plasma. The evidence supporting the conclusions is convincing, although a more thorough statistical comparison of the different techniques and technique combinations explored in the study would have been appreciated. The work will be of broad interest to cell biologists and biochemists.

    1. eLife Assessment

      This work provides a simple, rapid and valuable protocol for the isolation of small extracellular vesicles from small volumes of plasma, using two well-known methodologies, in tandem: size exclusion chromatography (SEC) and density gradient ultracentrifugation (DGUC). The authors exhaustively test these methodologies separately and in combination, showing superior results for the SEC-DGUC in terms of purity and yield. The results obtained in this work are convincing, using multiple state-of-art methodologies for the characterization of the isolates that support their conclusions.

    1. eLife Assessment

      This important study advances our understanding of genome annotations for chiton genomes. It provides a solid estimation of syntentic relationships for the chromosomes of the four new genomes plus an analysis linking these to other available chiton genomes, and an update for how these relate to molluscan genomes.

    1. eLife Assessment

      This fundamental study explores how genotypic changes relate to phenotypic stasis or variation within chitons, a molluscan group. Chitons are significant because their ancient body plan has remained largely unchanged for millions of years, yet the paper reveals rapid and large-scale genomic changes. This convincing study is a splendid advance in approximately doubling the number of sequenced chiton genomes, providing what appears to be among the best genome annotations for chiton genomes available to date. The study's key focus is on the genomic rearrangements across five reference-quality genomes of chitons and their implications for understanding evolutionary mechanisms, particularly in comparison to other molluscan clades.

    1. eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyP-driven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

    4. Author response:

      eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

      We thank the editorial team for the favorable assessment. We, however, contend the specific point on the difference in charge density. We have already performed experiments wherein a higher concentration of DNA is used to match the overall ‘concentration of charges’ as in the experiments with polyP (see Figure S6), and we do not identify or observe any differences in the maturation behavior with DNA, i.e. we see only dissolution at both higher and lower concentrations of DNA. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. 

      Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

      We thank the reviewer for providing a positive review of our work. On the comment related to the final paragraph, we note that we have already conducted an experiment with a higher DNA concentration (11.24 µM) to explore if the concentration of charges plays any significant role. The results of this experiment are presented in Figure S6. We observe that even at a higher DNA concentration, the condensates dissolve over time. Therefore, the difference in the maturation behavior of condensates with varying initial protein ensembles is due to the nature of polyP (likely through its enhanced flexibility). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

      We thank the reviewer for providing constructive comments on our work. On the final paragraph: we agree that the outcome does not provide information on nature of the starting ensemble. As of now, our experimental results are primarily observations on questions related to maturation outcomes when protein ensembles of varying structure, compactness and stability interact with polyP. if there are differences in the native ensemble due to mutations (which at times cannot be revealed by ensemble probes), polyP appears to discern it more efficiently than DNA.

    1. eLife Assessment

      This valuable study reveals the pro-locomotor effects of activating a deep brain region containing diverse range of neurons in both healthy and Parkinson's disease mouse models. While the findings are solid, mechanistic insights remain limited due to the small sample size. This research is relevant to motor control researchers and offers clinical perspectives.

    2. Reviewer #1 (Public review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsilesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion-the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla. Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease a new perspective with the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and even kinematic aspects during stimulation could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      (3) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.<br /> Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      (4) Referred to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E. Please do that.

      Summary of the Study after revision

      The revised manuscript reflects significant efforts to improve clarity, organization, and data interpretation. The refinements in anatomical descriptions, behavioral analyses, and contextual framing have strengthened the manuscript considerably. However, the study still lacks direct causal evidence linking anatomical remodeling to behavioral improvements, and the small sample size in the anatomical analyses remains a concern. The authors have addressed many points raised in the initial review, but further acknowledgement of the exploratory nature of these findings would enhance the scientific rigor of the work.

      Key Improvements in the Revision

      The revised manuscript demonstrates considerable progress in clarifying data presentation, refining behavioral analyses, and improving the contextualization of anatomical findings. The restructuring of the anatomical section now provides greater precision in describing motor-related pathways, integrating terminology from the Allen Brain Atlas. The addition of new figures (Figures 4 and 5) strengthens the accessibility of these findings by illustrating key connectivity patterns more effectively. Furthermore, the correlation matrices have been adjusted to improve interpretability, ensuring that the presented data contribute meaningfully to the overall narrative of the study.

      The authors have also made significant improvements in their behavioral analyses, particularly in the organization and presentation of locomotor data. Figure 3 has been revised to distinctly separate results from 6-OHDA and sham animals, providing a clearer comparison of locomotor outcomes. Additional metrics, such as reaction time, locomotion bouts, and movement speed, further enhance the granularity of the analysis, making the results more informative.

      The discussion surrounding anatomical connectivity has also been strengthened. The revised manuscript now places greater emphasis on motor-related pathways and refines its analysis of A13 efferents and afferents. A newly introduced figure provides a concise summary of these connections, improving the contextualization of the anatomical data within the study's broader scope. Moreover, the authors have addressed the translational relevance of their findings by acknowledging the differences between optogenetic stimulation and deep brain stimulation (DBS). Their discussion now better situates the findings within existing literature on PD-related motor circuits, providing a more balanced perspective on the potential implications of A13 stimulation.

      Remaining Concerns

      Despite these substantial improvements, a number of critical concerns remain. The anatomical findings, though insightful, remain largely correlative and do not establish a causal link between structural remodeling and locomotor recovery. While the authors argue that these data will serve as a reference for future investigations, their necessity for the core conclusions of the study is not entirely clear. Additionally, while the anatomical data offer an interesting perspective on A13 connectivity, their direct relevance to the study's primary goal-demonstrating the role of A13 in locomotor recovery-remains uncertain. The authors emphasize that these data will be valuable for future research, yet their integration into the study's main narrative feels somewhat supplementary. Based on this last thought of the authors it is even more relevant another key limitation lying in the small sample size used for connectivity analyses. With only two sham and three 6-OHDA animals included, the statistical confidence in the findings is inherently limited. The absence of direct statistical comparisons between ipsilesional and contralesional projections further weakens the conclusions drawn from these anatomical studies. The authors have acknowledged that obtaining the necessary samples, acquiring the data, and analyzing them is a prolonged and resource-intensive process. While this may be a valid practical limitation, it does not justify the lack of a robust statistical approach. A more rigorous statistical framework should be employed to reinforce the findings, or alternative techniques should be considered to provide additional validation. Given these constraints, it remains unclear why the authors have not opted for standard immunohistochemistry, which could provide a complementary and more statistically accessible approach to validate the anatomical findings. Employing such an approach would not only increase the robustness of the results but also strengthen the study's impact by providing an independent confirmation of the observed structural changes.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection, thus suggesting a remodeling of the A13 connectome. Whether this remodelling contributes to pro-locomotor effects of the photostimulation of the A13 region remains unknown as causality was not addressed.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients. The study also provides a description of the A13 region connectome pertaining to motor behaviors and how it changes after a dopaminergic lesion. Although there is no causal link between anatomical and behavioral data, it raises interesting questions for further studies.

      Weaknesses:

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, some uncertainty remains regarding the phenotype of neurons underlying recovery of akinesia and improvement of bradykinesia.

      Figure 4 is improved, but the results from the correlation analyses remain difficult to interpret, as they may reflect changes in various impaired brain regions independently of the A13 region. While the analysis offers a snapshot of correlated changes within the connectome, it does not identify which specific cell or axonal populations are actually increasing or decreasing. Although functional MRI connectome analyses are well-established, anatomical data seem less suitable for this purpose. How can one interpret correlated changes in anatomical inputs or outputs between two distinct regions?

      Figure 5 is also improved, but there is room for further enhancement. As currently presented, it is difficult to distinguish the differences between the sham and 6-OHDA groups. The first column could compare afferents, while the second column could compare efferents. Given the small sample size, it would be more appropriate to present individual data rather than the mean and standard deviation.

      Appraisal and impact

      Although the behavioral experiments are convincing, the low number of animals in the anatomical studies is insufficient to make any relevant statistical conclusions due to extremely low statistical power.

    4. Reviewer #3 (Public review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat by dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also some suggestions, that may improve the paper compared to its recent form, come to mind.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract and conclusion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular lack of disease-specific changes in the OFT) seem insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, that only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      In conclusion, this is an interesting study that can be improved taking into consideration the points mentioned above.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsi lesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion - the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We initially considered two approaches. The first was to look at specific projections to the motor regions, focusing on the MLR. The second was to utilize a whole-brain analysis, which is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that examining the full connectome was a reasonable starting point.

      The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were taken from the Allen Brain Atlas terminology and presented as abbreviations. We have added two new figures focusing on motor regions to make the information more comprehensible (new Figures 4 and 5) and rewrote the connectomics section to make it easier to understand.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point to help simplify the whole-brain results. We have presented the motor-related inputs and outputs as part of a new figure in the main paper (Figure 5) and added accompanying text in the results section. We have also updated the correlation matrices to concentrate on motor regions (Figure 4). This highlights possible therapeutic pathways. We have also enhanced our discussion of these motor-related pathways. We have retained the entire dataset and added it to our data repository for those interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice. As pointed out, Kleinfeld’s group presented their data in a nice, focused way. For the connectomic piece, we have added Figure 5, which provides a better representation than our previous submission.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We have reorganized Figure 3 to highlight the metrics by separating the 6-OHDA from the Sham experiments (3F-J, which highlights distance travelled, average speed and duration). We have also added additional text to highlight these metrics better in the text. We have relabelled Supplementary Figure S3, which presents reaction time as latency to initiate locomotion and updated the main text to address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we have added this reference. It is helpful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. Grossman’s results prompted a later clinical examination of the zona incerta, but it concentrated on the zona incerta regions close to the subthalamic regions (Ossowska 2019), further caudal to the area we focused on. Chen et al. (2023) targeted the area in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      GABAergic activation of mZI to Cuneiform projections (Sharma et al. 2024) also did not produce thigmotactic behavior. We have added these points to the discussion.

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We have since quantified the cell number, we found that the c-fos cell number was increased following ChR2 activation. There is evidence of TH activation - but the data suggest that other cell types contribute. C-fos alone is a blunt tool to assess specificity - rather, it is better at showing overall photostimulus efficacy - which we have demonstrated. Moreover, there is evidence that cell types are not purely dopaminergic, with GABA co-localized (Negishi et al. 2020). We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while helpful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Our recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters (Sharma et al. 2024). Recent work shows a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII. The discussion has been updated.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in the original Figure 5C and G. As mentioned in comments for Reviewer 1, we have now presented the data in revised Figure 4 and 5 that focuses specifically on motor-related pathways to provide information on possible pathways. The has simplified the correlation matrices and highlighted the differences in 6-OHDA efferent data especially. As suggested, raw values are shared in a supplemental file on our data repository.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. We aim to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important to document. We have added a sentence acknowledging this limitation to the discussion.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      Our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner.

      While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020). We have added to the discussion.

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we have performed additional analysis and present this in Figure 5. We have also revised Figure 4, focusing on motor regions. Our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We have now provided additional information regarding numbers in all figures and the text. Using Spearman’s correlation analysis, we found afferent and efferent proportions across animals to be consistent, with an average correlation of 0.91, which is reported in Figure S6.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. The lack of locomotion observed in 6-OHDA models can be reversed by A13 region photostimulation. Therefore, this is a reversal of a loss of function, in this case. However, the increase in turning represents a gain of function. We have highlighted this as suggested in the discussion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the valuable comments. They have been incorporated into the discussion.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019) ). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models. We have added to the discussion.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      We have completed this analysis. The data is presented in Figure 2F, where we show increased c-fos intensity with photoactivation. We observed an increase in the number of cells activated in the A13 region. However, we did not definitively see increases in TH+ cells, suggesting a heterogeneous set of neurons responsible for the effects—possibly glutamatergic neurons.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      We have added text and a new graph.

      (2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E.

      Please do that.

      Done, and we have updated the figure to improve readability, by separating the 6-OHDA findings from sham in all graphs.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      Overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents.

      Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). The introduction has been updated.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. Introduction has been updated.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA.

      Please correct through the text.

      Corrected.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Corrected.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      This has been corrected

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier, provided c-Fos quantification, and provided a new supplementary figure showing viral spread (Figure S1).

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. This has been done in Figure 2 and we also present a new Figure S1 where we have quantified the viral spread.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Corrected. Removed Kiehn reference.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Done

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Done

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1C, 1-way RM ANOVA: F5,25 = 0.486, P \= 0.783). This has been updated in the text.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. The error is corrected in the legend. Great suggestion for F-I - they have been moved ahead of the summary figures. We have also updated new Fig 3F-,I, J, L, M) to make the differences between 6-OHDA and sham graphs easier to visualize. The stats do indicate a significant difference during the stimulation phase. We have added group labels, and reorganized the figure, and it is much easier to read now.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We have renamed the figure names to be more intuitive. We have standardized the presentation of statistics in the figure, and eliminated the a,b,c nomenclature. We have also updated the caption to provide descriptions of the tests in Fig 3 L-M.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      We have removed these plots from Figure S2. We now present the Baseline to Pre values across the experimental timespan to illustrate the fact that distance travelled returned to baseline values for all trials conducted.

      Fig. S2B: add the statistical marker.

      We have removed this from Figure S2.

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Done

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Done

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      Added the timeframe to this sentence.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Done.

      Page 8, para 3, line 4. Double-check the reference.

      Corrected.

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We have updated the figures for connectivity throughout the manuscript. Overall, there are new Figures 4 and 5 in the main text. We also provide a revised Supplementary Figure 8. Unfortunately, we could not do that experiment regarding local connectivity. In light of our new work (Sharma et al. 2024), it is clear that this will be critical going forward.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      We have provided more information about the viral spread in the text and Supplementary Figure 1. The functional and anatomical experiments are separate, which we realize caused confusion. We have mentioned analysis time after 6-OHDA and inserted this into the text.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We have provided a new Figure 5 where we present quantification per region, and the correlation matrices have been updated in Figure 4. We have also focused on motor regions as mentioned earlier. We also provide examples of raw regions in Supplementary Figure 8. Raw values are shared on our data repository.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Done

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Done

      Page 10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Done

      Page 10, para 2: the section should be written in the past tense.

      Done

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Done

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5A. As mentioned in comments for Reviewers 1 and 2, we have revised Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values are shared on our data repository.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labelling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each animal. This figure has been updated as reviewers requested.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we have provided absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Figure 6 has been removed.

      Discussion

      Although interesting, the discussion is too long.

      The discussion has been reduced by about three quarters of a page.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      Added.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” i Science 26 (7). https://doi.org/ 10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October.

      https://doi.org/ 10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January):1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December):102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M.  Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/ 10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Sharma, Sandeep, Cecilia A. Badenhorst, Donovan M. Ashby, Stephanie A. Di Vito, Michelle A. Tran, Zahra Ghavasieh, Gurleen K. Grewal, Cole R. Belway, Alexander McGirr, and Patrick J. Whelan. 2024. “Inhibitory Medial Zona Incerta Pathway Drives Exploratory Behavior by Inhibiting Glutamatergic Cuneiform Neurons.” Nature Communications 15 (1): 1160.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel  Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/ 10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April):144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki  Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv: The Preprint Server for Biology, June. https://doi.org/ 10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. eLife Assessment

      This important study uses extensive comparative analysis to examine the relationship between plasma glucose levels, albumin glycation levels, and diet and life history, within the framework of the "pace of life syndrome" hypothesis. The evidence that glucose is positively correlated with glycation levels and lifespan is convincing and, although there are some limitations related to data collection, they likely make the statistically significant findings more conservative. As the first extensive comparative analysis of glycation rates, life history, and glucose levels in birds, the study has the potential to be of interest to evolutionary ecologists and the aging research community more broadly.

    2. Reviewer #2 (Public review):

      Summary:

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and life-history traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths:

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses:

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. eLife Assessment

      This valuable study by Guo and colleagues reports the inhibitory activity of caffeic acid phenethyl ester (CAPE) against TcdB, a key toxin produced by Clostridioides difficile. C. difficile infections are a major public health concern, and this manuscript provides interesting data on toxin inhibition by CAPE, a potentially promising therapeutic alternative for this disease. The strength of the evidence to support the conclusions is solid, with some concerns about the moderate effects on the mouse infection model and direct binding assays of CAPE to the toxin.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthen the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail, and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      The authors have made some changes in the revised version. However, many of the changes were superficial, and some concerns still need to be addressed. Important details are still missing from the description of some experiments. Authors should carefully revise the manuscript to ascertain that all details that could affect interpretation of their results are presented clearly. For instance, authors still need to include details of how the metabolomics analyses were performed. Just stating that samples were "frozen for metabolomics analyses" is not enough. Was this mass-spec or NMR-based metabolomics. Assuming it was mass-spec, what kind? How was metabolite identity assigned, etc? These are important details, which need to be included. Even in cases where additional information was included, the authors did not discuss how the specific way in which certain experiments were performed could affect interpretation of their results. One example is the potential for compound carryover in their experiments. Another important one is the fact that CAPE affects bacterial growth and sporulation. Therefore, it is critical that authors acknowledge that they cannot discard the possibility that other factors besides compound interactions with the toxin are involved in their phenotypes. As stated previously, authors should also be careful when drawing conclusions from the analysis of microbiota composition data, and changes to the manuscript should be made to reflect this. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Again, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

    3. Reviewer #2 (Public review):

      I appreciate the author's responses to my original review. This is a comprehensive analysis of CAPE on C. difficile activity. It seems like this compound affects all aspects of C. difficile, which could make it effective during infection but also make it difficult to understand the mechanism. Even considering the authors responses, I think it is critical for the authors to work on the conclusions regarding the infection model. There is some protection from disease by CAPE but some parameters are not substantially changed. For instance, weight loss is not significantly different in the C. difficile only group versus the C. difficile + CAPE group. Histology analysis still shows a substantial amount of pathology in the C. difficile + CAPE group. This should be discussed more thoroughly using precise language.

    4. Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI

      Strengths:

      Results are really good, and the CAPE shows a good and promising alternative for treating CDI.

      Weaknesses:

      Some references are too old or missing.

      Comments on revisions:

      I have read your study after comments made by all referees, and I noticed that all questions and suggestions addressed to the authors were answered and well explained. Some of the minor and major issues related to the article were also solved. I am satisfied with all the effort given by the authors to improve their manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

    1. eLife Assessment

      The manuscript provides an important assessment of the number and distribution of different retrovirus env genes present in primate genomes in the form of ancient endogenous retroviruses (ERV loci) and the potential role that viral recombination played in the diversification of retrovirus env genes and their propagation in the primate germline over millions of years. The paper convincingly describes how intermixing/recombination occurs with this viruses, representing a conceptual advance with potentially broad implications.

    2. Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses to the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely under estimate the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      Comments on revisions:

      All comments have been adequately addressed.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.

      Strengths

      The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.

      Weaknesses

      Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.

      For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.

      Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.

      We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.

      (1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that  were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.

      (2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).

      (3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.

      (4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.

      (5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      (6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The paper could be strengthened by:

      - a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.

      - check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.

      We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675

      - adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.

      We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.

      - more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?

      We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.

      - confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species

      As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      - present the lists of findings in primate genomes on pages 9 and 10 in tables

      We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.

      - a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.

      As the reviewer pointed out, the study was designed to explore ERVs’ Env  sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.

      - define abbreviations at first use (e.g. HML in abstract)

      We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)

      - explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.

      We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Furthermore:

      - why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?

      We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.

      - how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?

      We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.

      English grammar should be improved throughout the manuscript.

      And I could not open half of the supplementary files

      As suggested we have revised English and checked that all files were correctly open.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.

      Strengths:

      The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.

      The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.

      Weaknesses:

      The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.

      We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.

      Reviewer #2 (Recommendations for the authors):

      Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.

      The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.

      I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.

      Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.

      We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).

      As suggested we have revised English.

      Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.

      We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?

      Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.

      Line 221- "defferent" should be "different"

      Corrected

      Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?

      Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.

      Line 252 - if/is

      Corrected

      Lines 274-276 needs a citation to the paper(s) that reported this.

      Corrected

      Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.

      We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.

      Line 298 - missing comma

      Corrected

      Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.

      We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)

      Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?

      For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?

      We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”

      For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.

      Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?

      We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).

      Line 495 - "previously reported" should include citation(s) of the prior report(s).

      We thank the reviewer for the observation, we have provided appropriate citations.

      Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?

      We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099

      Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.

      We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.

      Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.

      The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.

      Reviewer #3 (Recommendations for the authors):

      In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.

      We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.

    1. eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

    2. Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

    3. Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

    4. Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

    5. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed and hierarchical methods used to ensure that the model we report in the main text was the best fit to the data while not overfitting. We are not certain about what is meant by “[a]ddressing model transparency,” but as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (more complex models). Although details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials, we have clarified for the less familiar reader how each of these steps ensures that the parameters we estimate are not only identifiable and interpretable, but also ensure that the model can reproduce key patterns in the data, supporting the validity of the model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can forestall reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Ratcliff & Childers, 2015; Wiecki et al., 2013), and previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Additionally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful for the paper to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, it is not the case that more complex models (i.e., having additional parameters) would automatically have lower WAICs. Additionally, we note that our second method to assess model fit, posterior predictive checks demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss those patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which in fact penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting suggestions and appreciate the opportunity to clarify that we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics.  We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we also now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect, and that future studies should integrate food choice task data pre and post-affect inductions with measures that capture the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s perspective and suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added additional sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials. We have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors. Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation.  Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have added to our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the small sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our original acknowledgements of limitations in the Discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the supplementary materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019), and which we now report in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we don’t have any measurements of cravings, we did measure negative urgency. Despite these prior findings, the original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the supplementary materials.      

      References

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

    1. eLife Assessment

      This study dissects the function of 3 outputs of a specific population of modulatory neurons, dorsal raphe dopamine neurons, in social and affective behavior. It provides valuable information that both confirms prior results and provides new insights. The strength of the evidence is convincing, based on cutting-edge approaches and analysis. This study will be of interest to behavioral and systems neuroscientists, especially those interested in social and emotional behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The authors had previously found that brief social isolation could increase the activity of these neurons, and that manipulation of these neurons could alter social behavior in a social rank-dependent fashion. This manuscript explored which of the outputs were responsible for this, identifying the central nucleus of the amygdala as the key output region. The authors identified some discrete behavior changes associated with these outputs, and found that during photostimulation of these outputs, neuronal activity appeared altered in 'social response' neurons.

      Strengths:

      Rigorous analysis of the anatomy. Careful examination of the heterogenous effects on cell activity due to stimulation, linking the physiology with the behavior via photostimulation during recording in vivo.

      Weaknesses:

      (1) There are some clear imbalances in the sample size across the different regions parsed. The CeA has a larger sample size, likely in part to the previous work suggesting differential effects depending on social rank/dominance. Given the potential variance, it may be hard to draw conclusions about the impact of stimulation across different social ranks for other groups.

      (2) It is somewhat unclear why only the 'social object ratio' was used to assess the effects versus more direct measurements of social behavior.

      (3) Somewhat related, while it is statistically significant, it is unclear if the change seen in face investigation of biologically significant, on average, it looks like a few-seconds difference and that was not modulated by social rank.

      (4) There are several papers studying these neurons that have explored behaviors examined here, as well as the physiological connectivity that are not cited that would provide important context for this work. In particular, multiple groups have found a dopamine-mediated IPSP in the BNST, in contrast to this work. There are technical differences that may drive these differences, but not addressing them is a major weakness.

      (5) The inclusion of some markers for receptors for some of these outputs is interesting, and the authors suggest that this may be important, but this is somewhat disconnected from the rest of the work performed.

    3. Reviewer #2 (Public review):

      Summary:

      The authors perform a series of studies to follow up on their previous work, which established a role for dorsal raphe dopamine neurons (DRN) in the regulation of social-isolation-induced rebound in mice. In the present study, Lee et. al, use a combination of modern circuit tools to investigate putatively distinct roles of DRN dopamine transporting containing (DAT) projections to the bed nucleus of the stria terminalis (BNST), central amygdala (CeA), and posterior basolateral amygdala (BLP). Notably, they reveal that optogenetic stimulation of distinct pathways confers specific behavioral states, with DRNDAT-BLP driving aversion, DRNDAT-BNST regulating non-social exploratory behavior, and DRNDAT-CeA promoting social ability. A combination of electrophysiological studies and in situ hybridization studies reveal heterogenous dopamine and neuropeptide expression and different firing properties, providing further evidence of pathway-specific neural properties. Lastly, the authors combine optogenetics and calcium imaging to resolve social encoding properties in the DRNDAT-CeA pathway, which correlates observed social behavior to socially engaged neural ensembles.

      Collectively, these studies provide an interesting way of dissecting out separable features of a complex multifaceted social-emotional state that accompanies social isolation and the perception of 'loneliness.' The main conclusions of the paper provide an important and interesting set of findings that increase our understanding of these distinct DRN projections and their role in a range of social (e.g., prosocial, dominance), non-social, and emotional behaviors. However, as noted below, the examination of these circuits within a homeostatic framework is limited given that a number of the datasets did not include an isolated condition. The DRNDAT-CeA pathway was investigated with respect to social homeostatic states in the present study for some of the datasets.

      Strengths:

      (1) The authors perform a comprehensive and elegant dissection of the anatomical, behavioral, molecular, and physiological properties of distinct DRN projections relevant to social, non-social, and emotional behavior, to address multifaceted and complex features of social state.

      (2) This work builds on prior findings of isolation-induced changes in DRN neurons and provides a working framework for broader circuit elements that can be addressed across the social homeostatic state.

      (3) This work characterizes a broader circuit implicated in social isolation and provides a number of downstream targets to explore, setting a nice foundation for future investigation.

      (4) The studies account for social rank and anxiety-like behavior in several of the datasets, which are an important consideration to the interpretation of social motivation states, especially in male mice with respect to dominance behavior.

      Weaknesses:

      (1) The conceptual framework of the study is based on the premise of social isolation and perceived 'loneliness' under the framework of social homeostasis, analogous to hunger. In this framework, social isolation should provoke an aversive state and compensatory social contact behavior. In the authors' prior work, they demonstrate synaptic changes in DRN neurons and social rebound following acute social isolation. Thus, the prediction would be that downstream projections also would show state-dependent changes as a function of social housing conditions (e.g., grouped vs. isolated). In the current paper, a social isolation condition was not included for the majority of the studies conducted (e.g., Figures 1-6 do not include an isolated condition, Figures 7-8 do include an isolated condition). Thus, while Figure 1-6 adds a very interesting and compelling set of data that is of high value to the social behavior field with respect to social and emotional processing and general circuit characterization, these studies do not directly investigate the impacts of dynamic social homeostatic state. The main claim of the paper, including the title (e.g., separable DRN projections mediate facets of loneliness-like state), abstract, intro, and discussion presents the claim of this work under the framework of dynamic social homeostatic states, which should be interpreted with caution, as the majority of the work in the paper did not include a social isolation comparison.

      (2) In Figure 1, the authors confirm co-laterals in the BNST and CeA via anatomical tracing studies. The goal of the optogenetic studies is to dissociate the functional/behavioral roles of distinct projections. However, one limitation of optogenetic projection targeting is the possibility of back-propagating action potentials (stimulation of terminals in one region may back-propagate to activate cell bodies, and then afferent projections to other regions), and/or stimulation of fibers of passage. Therefore, one limitation in the dataset for the optogenetic stimulation studies is the possibility of non-specific unintended activation of projections other than those intended (e.g., DRNDAT-CeA). This can be dealt with by administering lidocaine to prevent back-propagating action potentials.

      (3) It is unclear from the test, but in the subjects' section of the methods, it appears that only male animals were included in the study, with no mention of female subjects. It should be clear to the reader that this was conducted in males only if that is the case, with consideration or discussion, about female subjects and sex as a biological variable.

      (4) Averaged data are generally reported throughout the study in the form of bar graphs, across most figures. Individual data points would increase the transparency of the data.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of dopaminergic neurons (dopamine transporter expressing, DAT) in the dorsal raphe nucleus (DRN) in regulating social and affective behavior through projections to the central nucleus of the amygdala (CeA), bed nucleus of the stria terminalis (BNST), and the posterior subdivision of the basolateral amygdala. The largest effect observed was in the DRN-DAT projections to the CeA. Augmenting previously published results from this group (Matthews et al., 2016), the comprehensive behavioral analysis relative to social dominance, gene expression analysis, electrophysiological profiling, and in vivo imaging provides novel insights into how DRN-DAT projections to the CeA influence the engagement of social behavior in the contexts of group-housed and socially isolated mice.

      Strengths:

      Correlational analysis with social dominance is a nice addition to the study. The overall computational analyses performed are well-designed and rigorous.

      Weaknesses:

      (1) Analysis of dopamine receptor expression did not include Drd3, Drd4, or Drd5 which may provide more insights into how dopamine modulates downstream targets. This is particularly relevant to the BNST projection in which the densest innervation did not robustly co-localize with the expression of either Drd1 or Drd2. It is also possible that dopamine release from DRN-DAT neurons in any or all of these structures modulates neurotransmitter release from inputs to these regions that contain D2 receptors on their terminals.

      (2) Although not the focus of this study, without pharmacological blockade of dopamine receptors, it is not possible to assess what the contribution of dopamine is to the behavioral outcomes. Given the co-release of glutamate and GABA from these neurons, it is possible that dopamine plays only a marginal role in the functional connectivity of DRN-DAT neurons. (

      (3) Photostimulation parameters used during the behavioral studies (8 pulses of light delivered at 30 Hz for several minutes) could lead to confounding results limiting data interpretation. As shown in Figure 6J, 8 pulses of light delivered at 30 Hz result in a significant attenuation of the EPSC amplitude in the BLP and CeA projection. Thus, prolonged stimulation could lead to significant synaptic rundown resulting in an overall suppression of connectivity in the later stages of the behavioral analyses.

    1. eLife Assessment

      This study thoroughly assesses tactile acuity on women's breasts, for which no dependable data currently exists. The study provides two important contributions, by convincingly showing that tactile acuity on the breast is poor in comparison to other body parts, and that acuity is worst in larger breasts, indicating that the number of tactile sensors is fixed. However, further arguments concerning the role of the nipple in spatial localisation are not well supported by the current evidence. This study will be of interest to the broader community of touch, as well as those interested in breast reconstruction and sexual function.

    2. Reviewer #1 (Public review):

      The authors investigated tactile spatial perception on the breast through discrimination, categorization, and direct localization tasks. They reach three main conclusions:

      (1) The breast has poor tactile spatial resolution.<br /> This conclusion is based on comparing just noticeable differences, a marker of tactile spatial resolution, across four body regions, two on the breast. The data compellingly support the conclusion; the study outshines other studies on tactile spatial resolution that tend to use problematic measures of tactile resolution such as two-point-discrimination thresholds. The result will interest researchers in the field and possibly in other fields due to the intriguing tension between the finding and the sexually arousing function of touching the breast.

      (2) Larger breasts are associated with lower tactile spatial resolution<br /> This conclusion is based on a strong correlation between participants' JNDs and the size of their breasts. The correlation convincingly supports the conclusion. It is of interest to the field, as it aligns with the hypothesis that nerve fibers are more sparsely distributed across larger body parts.

      (3) The nipple is a landmark: perceptually a unit and an attractor for tactile percepts<br /> The data do not support these conclusions. The conclusion that the nipple is perceived as a unit is based on poor performance in tactile categorization for touches on the nipple. This categorization performance may simply mirror the breast's low tactile spatial resolution with JNDs about the size of a nipple.

      The conclusion that tactile percepts are drawn towards the nipple is based on tactile localization biases towards the nipple for tactile stimuli on the breast compared to localization biases for tactile stimuli on the back. Currently, the statistical analysis of the data does not match the field, psychophysics, standards. Moreover, any bias towards the nipple could simply be another instance of regression to the mean of the stimulus distribution, given that the tested locations were centered on the nipple. This confound can only be experimentally solved by shifting the distribution of the tested locations. Finally, given that participants indicated the locations on a 3D model of the body part, further experimentation would be required to determine whether there is a perceptual bias towards the nipple or whether the authors merely find a response bias.

      Further comments:

      - Given that later analyses require regression models, the authors might consider using them throughout.

      - The stability of the JND differences between body parts across subjects is already captured in the analysis of the JNDs; the ANOVA and the post-hoc testing would not be significant if the order were not relatively stable across participants. Thus, it is unclear why this is being evaluated again with reduced power due to improper statistics.

      - The null hypothesis of an ANOVA is that at least one of the mean values is different from the others; adding participants as a factor does not provide evidence for similarity.

      - The pairwise correlations between body parts seem to be exploratory in nature. Like all exploratory analyses, the question arises of how much potential extra insights outweigh the risk of false positives. It would be hard to generate data with significant differences between several conditions and not find any correlations between pairs of conditions. Thus, the a priori chance of finding a significant correlation is much higher than what a correction accounts for.

      - If the JND at mid breast (measured with locations centered at the nipple) is roughly the same size as the nipple, it is not surprising that participants have difficulty with the categorical localization task on the nipple but perform better than chance on the significantly larger areola.

      - To justify the conclusion that the nipple is a unit, additional data would be required. 1) One could compare psychometric curves with the nipple as the center and psychometric curves with a nearby point on the areola as the center. 2) Performance in the quadrant task could be compared for the nipple and an equally sized portion of the areola. Otherwise, the task "only" provides confirmatory evidence for a low tactile resolution in the midbreast area.

      - A localization bias toward the nipple in this context does not show that the nipple is the anchor of the breast's tactile coordinate system. The result might simply be an instance of regression to the mean of the stimulus distribution (also known as experimental prior). To convincingly show localization biases towards the nipple, the tested locations should be centered at another location on the breast.

      - Another problem is the visual salience of the nipple, even though Blender models were uniformly grey. With this type of direct localization, it is very difficult to distinguish perceptual from response biases even if the regression to the mean problem is solved. There are two solutions to this problem: 1) Varying the uncertainty of the tactile spatial information, for example, by using a pen that exerts lighter pressure. A perceptual bias should be stronger for more uncertain sensory information; a response bias should be the same across conditions. 2) Measure bias with a 2IFC procedure by taking advantage of the fact that sensory information is noisier if the test is presented before the standard.

      - Neither signed nor absolute localization error can be compared to the results of the previous experiments. The JND should be roughly proportional to the variance of the errors.

      - The statistically adequate way of testing the biases is a hierarchical regression model (LMM) with a distance of the physical location from the nipple as a predictor, and a distance of the reported location from the nipple as a dependent variable. Either variable can be unsigned or signed for greater power, for example, coding the lateral breast as negative and the medial breast as positive. The bias will show in regression coefficients smaller than 1.

      - It does not matter whether distances are calculated based on skin or 3D coordinates, as Euclidean distances or based on polar coordinates. However, there should only be one consistent distance in the text across both independent and dependent variables. Calculating various versions of these measures can create issues in Frequentist Statistics. For transparency, it is good practice to report the results of other methods for calculating the distance in the supplement.

      - The body part could be added as a predictor to the LMM, with differences in bias between the body parts showing a significant interaction between the two predictors. The figures suggest such an effect. However, the interpretation should take into account that 1) response biases are more likely to arise at the breast and 2) it might be harder to learn the range of locations on the back given that stimulation is not restricted to an anatomically defined region as it is the case for the breast.

    3. Reviewer #2 (Public review):

      The authors tested tactile acuity on the breast of females using several tasks and reported overall low acuity compared to the back, which is typically considered to have the worst acuity of all body parts. Moreover, there was evidence that acuity is worse the larger the breast; this finding mirrors similar findings for the hand and therefore suggests that the number of tactile sensors is fixed and must be distributed across a larger extent of skin when a body part is larger, thus resulting in comparably lower tactile acuity.

      Strengths:

      I find this an interesting paper with results that are relevant to the tactile community. The authors apply several tasks allowing them to link the paper with previous results. The methodology and psychophysical analysis are sound.

      Weaknesses:

      The analysis of localization error direction, with the result that the nipple area may be a landmark for tactile localization, is interesting and aligns the paper with some other recent papers that have suggested that such landmarks should exist. However, there are major issues with methodology and statistics, so that currently the conclusions are not supported.

      In the following, line numbers refer to the re-formatted manuscript provided by the authors upon request and are mentioned for them to find the relevant passages faster.

      (1) Comments on analysis of tactile acuity:

      - I had a hard time understanding some parts of the report. What is meant by "broadly no relationship" in line 137?

      - It is suggested that spatial expansion (which is correlated with body part size) is related between medial breast and hand - is this to say that women with large hands have large medial breast size? Nipple size was measured, but hand size was not measured, is this correct?

      - It is furthermore unclear how the authors differentiate medial breast and NAC. The sentence in lines 140-141 seems to imply the two terms are considered the same, as a conclusion about NAC is drawn from a result about the medial breast. This requires clarification.

      - Finally, given that the authors suspect that overall localization ability (or attention) may be overshadowed by a size effect, would not an analysis be adequate that integrates both, e.g. a regression with multiple predictors?

      (2) Comments on analysis of "The nipple is a unit":

      - Statistics in this section are not adequately described and may be partly false.

      - In the paragraph about testing quadrants of the nipple, it is stated that only 3 of 10 participants barely outperformed chance with a p < 0.01. It is unclear how a significant t-test is an indication of "barely above chance".

      - The final part of the paragraph on nipple quadrants (starting line 176) explains that there was a trend (4 of 10 participants) for lower tactile acuity being related to the inability to differentiate quadrants. It seems to me that such a result would not be expected: The stated hypothesis is that all participants have the same number of tactile sensors in their nipple and areola, independent of NAC size. In this section, participants determine the quadrant of a single touch. Theoretically, all participants should be equally able to perform this task, because they all have the same number of receptors in each quadrant of nipple and areola. Thus, the result in Figure 2C is curious.

      (3) Comments on analysis of "Absolute localization on the breast is anchored to the nipple"

      - Again, there are things that are unclear with the statistics and description of the analysis.

      - This section reports an Anova (line 193/194) with a factor "participant". This doesn't appear sensible. Please clarify. The factor distance is also unclear; is this a categorical or a continuous variable? Line 400 implies a 6-level factor, but Anovas and their factors, respectively, are not described in methods (nor are any of the other statistical approaches).

      - The analysis on imprecision using mean pairwise error (line 199) is unclear: does pairwise refer to x/y or to touch vs. center of the nipple?

      - p8, upper text, what is meant by "relative over-representation of the depth axis"? Does this refer to the breast having depth but the equivalent area on the back not having depth? What are the horizontal planes (probably meant to be singular?) - do you simply mean that depth was ignored for the calculation of errors? This seems to be implied in Figure 3AB.

      - Lines 232-241, I cannot follow the conclusions drawn here. First, it is not clear to a reader what the aim of the presented analyses is: what are you looking for when you analyze the vectors? Second, "vector strength" should be briefly explained in the main text. Third, it is not clear how the final conclusion is drawn. If there is a bias of all locations towards the nipple, then a point closer to the nipple cannot exhibit a large bias, because the nipple is close-by. Therefore, one would expect that points close to the nipple exhibit smaller errors, but this would not imply higher acuity - just less space for localizing anything. The higher acuity conclusion is at odds with the remaining results, isn't it: acuity is low on the outer breast, but even lower at the NAC, so why would it be high in between the two?

      (4) Comments on the Discussion:

      The discussion makes some concrete suggestions for sensors in implants (line 283). It is not clear how the stated numbers were computed. Also, why should 4 sensors nipple quadrants receive individual sensors if the result here was that participants cannot distinguish these quadrants?

      Additional comments:

      I would find it interesting to know whether participants with small breast measurement delta had breast acuity comparable to the back. Alternatively, it would be interesting to know whether breast and back acuity are comparable in men. Such a result would imply that the torso has uniform acuity overall, but any spatial extension of the breast is unaccounted for. The lowest single participant data points in Figure 1B appear similar, which might support this idea.

    1. eLife Assessment

      This study proposes a valuable and interpretable approach for predicting hematoma expansion in patients with spontaneous intracerebral hemorrhage from non-contrast computed tomography. The predictive performance of the proposed method is solid through external validation using two datasets. The work will be of interest to medical biologists working on stroke and neuroimaging.

    2. Reviewer #1 (Public review):

      Summary:

      The study explores the use of Transport-based morphometry (TBM) to predict hematoma expansion and growth 24 hours post-event, leveraging Non-Contrast Computed Tomography (NCCT) scans combined with clinical and location-based information. The research holds significant clinical potential, as it could enable early intervention for patients at high risk of hematoma expansion, thereby improving outcomes. The study is well-structured, with detailed methodological descriptions and a clear presentation of results. However, the practical utility of the predictive tool requires further validation, as the current findings are based on retrospective data. Additionally, the impact of this tool on clinical decision-making and patient outcomes needs to be further investigated.

      Strengths

      (1) Clinical Relevance: The study addresses a critical need in clinical practice by providing a tool that could enhance diagnostic accuracy and guide early interventions, potentially improving patient outcomes.

      (2) Feature Visualization: The visualization and interpretation of features associated with hematoma expansion risk are highly valuable for clinicians, aiding in the understanding of model-derived insights and facilitating clinical application.

      (3) Methodological Rigor: The study provides a thorough description of methods, results, and discussions, ensuring transparency and reproducibility.

      Weaknesses:

      (1) The limited sample size in this study raises concerns about potential model overfitting. While the reported AUCROC of 0.71 may be acceptable for clinical use, the robustness of the model could be further enhanced by employing techniques such as k-fold cross-validation. This approach, which aggregates predictive results across multiple folds, mimics the consensus of diagnoses from multiple clinicians and could improve the model's reliability for clinical application. Additionally, in clinical practice, the utility of the model may depend on specific conditions, such as achieving high specificity to identify patients at risk of hematoma expansion, thereby enabling timely interventions. Consequently, while AUC is a commonly used metric, it may not fully capture the model's clinical applicability. The authors should consider discussing alternative performance metrics, such as specificity and sensitivity, which are more aligned with clinical needs. Furthermore, evaluating the model's performance in real-world clinical scenarios would provide valuable insights into its practical utility and potential impact on patient outcomes.

      (2) The authors compared the performance of TBM with clinical and location-based information, as well as other machine learning methods. While this comparison highlights the relative strengths of TBM, the study would benefit from providing concrete evidence on how this tool could enhance clinicians' ability to assess hematoma expansion in practice. For instance, it remains unclear whether integrating the model's output with a clinician's own assessment would lead to improved diagnostic accuracy or decision-making. Investigating this aspect-such as through studies evaluating the combined performance of clinician judgment and model predictions-could significantly enhance the tool's practical value.

    3. Reviewer #2 (Public review):

      Summary:

      The author presents a transport-based morphometry (TBM) approach for the discovery of non-contrast computed tomography (NCCT) markers of hematoma expansion risk in spontaneous intracerebral hemorrhage (ICH) patients. The findings demonstrate that TBM can quantify hematoma morphological features and outperforms existing clinical scoring systems in predicting 24-hour hematoma expansion. In addition, the inversion model can visualize features, which makes it interpretable. In conclusion, this research has clinical potential for ICH risk stratification, improving the precision of early interventions.

      Strengths:

      TBM quantifies hematoma morphological changes using the Wasserstein distance, which has a well-defined physical meaning. It identifies features that are difficult to detect through conventional visual inspection (such as peripheral density distribution and density heterogeneity), which provides evidence supporting the "avalanche effect" hypothesis in hematoma expansion pathophysiology.

      Weaknesses:

      (1) As a methodology-focused study, the description of the methods section somewhat lacks depth and focus, which may make it challenging for readers to fully grasp the overall structure and workflow of the approach. For instance, the manuscript lacks a systematic overview of the entire process, from NCCT image input to the final prediction output. A potential improvement would be to include a workflow figure at the beginning of the manuscript, summarizing the proposed method and subsequent analytical procedures. This would help readers better understand the mechanism of the model.

      (2) The description of the comparison algorithms could be more detailed. Since TBM directly utilizes NCCT images as input for prediction, while SVM and K-means are not inherently designed to process raw imaging data, it would be beneficial to clarify which specific features or input data were used for these comparison models. This would better highlight the effectiveness and advantages of the TBM method.

      (3) The relatively small training and testing dataset may limit the model's performance and generalizability. Notably, while the study mentions that 1,066 patients from the ERICH dataset met the inclusion criteria, only 170 were randomly selected for the test set. Leveraging the full 1,066 ERICH cases for model training and internal validation might potentially enhance the model's robustness and performance.

      (4) Some minor textual issues need to be checked and corrected, such as line 16 in the abstract "Incorporating these traits into a v achieved an AUROC of 0.71 ...".

      (5) Some figures need to be reformatted (e.g., the x-axis in Figure 2 a is blocked).

    1. eLife Assessment

      This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling and the evidence that individual investment in the behavior varies is solid. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

    2. Reviewer #1 (Public review):

      In this paper, the authors had 2 aims:

      (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it likely in an unpleasurable sensation that causes tooth damage.

      (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

      They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

      The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine grained silicates, and that removing it via brushing or washing is intentional.

      They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

      High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

      This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

      Strengths:

      The field experiment seemed well designed, and their quantification of the physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer that is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

      In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

      I commend their approach in trying to develop a quantitative model to generate predictions to compare to empirical data for their second aim.<br /> This is something others should strive for.

      I really appreciated the historical context of this paper in the introduction and found it very enjoyable and easy to read.

      I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. eLife Assessment

      This study provides fundamental insights into the regulation of a retained intron in the mRNA coding for OGT, a process known to be regulated by the O-GlcNAc cycling system, and highlights the functional role of the splicing regulator SFSWAP. The evidence supporting the claims of the authors is convincing; the authors performed an elegant state-of-the-art CRISPR knockout strategy and sophisticated bioinformatic analysis to identify SFSWAP as a negative regulator of alternative splicing. The work will be of interest to researchers in the fields of splicing and glycobiology.

    2. Reviewer #1 (Public review):

      Summary:

      Govindan and Conrad use a genome-wide CRISPR screen to identify genes regulating retention of intron 4 in OGT, leveraging an intron retention reporter system previously described (PMID: 35895270). Their OGT intron 4 reporter reliably responds to O-GlcNAc levels, mirroring the endogenous splicing event. Through a genome-wide CRISPR knockout library, they uncover a range of splicing-related genes, including multiple core spliceosome components, acting as negative regulators of OGT intron 4 retention. They choose to follow up on SFSWAP, a largely understudied splicing regulator shown to undergo rapid phosphorylation in response to O-GlcNAc level changes (PMID: 32329777). RNA-sequencing reveals that SFSWAP depletion not only promotes OGT intron 4 splicing but also broadly induces exon inclusion and intron splicing, affecting decoy exon usage. While this study offers interesting insights into intron retention regulation and O-GlcNAc signaling, the RNA-Sequencing experiments lack essential controls needed to provide full confidence to the authors' conclusions.

      Strengths:

      (1) This study presents an elegant genetic screening approach to identify regulators of intron retention, uncovering core spliceosome genes as unexpected positive regulators of intron retention.<br /> (2) The work proposes a novel functional role for SFSWAP in splicing regulation, suggesting that it acts as a negative regulator of splicing and cassette exon inclusion, which contrasts with expected SR-related protein functions.<br /> (3) The authors suggest an intriguing model where SFSWAP, along with other spliceosome proteins, promotes intron retention by associating with decoy exons.

      Weaknesses:

      (1) The conclusions regarding SFSWAP's impact on alternative splicing rely on cells treated with a single pool of two siRNAs for five days. The absence of independent siRNA treatments raises concerns about potential off-target effects, which may reduce confidence in the observed SFSWAP-dependent splicing changes. Rescue experiments or using additional independent siRNA treatments would strengthen the conclusions.<br /> (2) The mechanistic role of SFSWAP in splicing would benefit from further exploration, though this may be more appropriate for future studies.

      Comments on revisions:

      The authors have addressed all my previous recommendations.

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly.

      Strengths:

      - Exhaustive analysis of potential splicing factors in an unbiased screen.<br /> - Extensive genome wide bioinformatic analysis.<br /> - Thoughtful discussion and literature survey

      Weaknesses:

      - No firm evidence linking SFSWA to an O-GlcNAc specific mechanism.<br /> - Resulting model leaves many unanswered questions.

      Comments on revisions:

      I think the authors have adequately dealt with the overall reviewer's comments.

    4. Reviewer #3 (Public review):

      Summary:

      The major novel finding in this study is that SFSWAP, a splicing factor containing an RS domain but no canonical RNA binding domain, functions as a negative regulator of splicing. More specifically, it promotes retention of specific introns in a wide variety of transcripts including transcripts from the OGT gene previously studied by the Conrad lab. The balance between OGT intron retention and OGT complete splicing is an important regulator of O-GlcNAc expression levels in cells.

      Strengths:

      An elegant CRISPR knockout screen employed a GFP reporter, in which GFP is efficiently expressed only when the OGT retained intron is removed (so that the transcript will be exported from the nucleus to allow for translation of GFP). Factors whose CRISPR knockdown cause decreased intron retention therefore increase GFP, and these can be identified by sequencing RNA of GFP-sorted cells. SFSWAP was thus convincingly identified as a negative regulator of OGT retained intron splicing. More focused studies of OGT intron retention indicate that it may function by regulating a decoy exon previously identified in the intron, and that this may extend to other transcripts with decoy exons.

      Weaknesses:<br /> The mechanism by which SFSWAP represses retained introns is unclear, although some data suggests it can operate (in OGT) at the level of a recently reported decoy exon within that intron. Interesting / appropriate speculation about possible mechanism are provided and will likely be the subject of future studies.

      Overall the study is well done and carefully described.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Govindan and Conrad use a genome-wide CRISPR screen to identify genes regulating retention of intron 4 in OGT, leveraging an intron retention reporter system previously described (PMID: 35895270). Their OGT intron 4 reporter reliably responds to O-GlcNAc levels, mirroring the endogenous splicing event. Through a genome-wide CRISPR knockout library, they uncover a range of splicing-related genes, including multiple core spliceosome components, acting as negative regulators of OGT intron 4 retention. They choose to follow up on SFSWAP, a largely understudied splicing regulator shown to undergo rapid phosphorylation in response to O-GlcNAc level changes (PMID: 32329777). RNA-sequencing reveals that SFSWAP depletion not only promotes OGT intron 4 splicing but also broadly induces exon inclusion and intron splicing, affecting decoy exon usage. While this study offers interesting insights into intron retention and O-GlcNAc signaling regulation, the RNA sequencing experiments lack the essential controls needed to provide full confidence to the authors' conclusions. 

      Strengths: 

      (1) This study presents an elegant genetic screening approach to identify regulators of intron retention, uncovering core spliceosome genes as unexpected positive regulators of intron retention. 

      (2) The work proposes a novel functional role for SFSWAP in splicing regulation, suggesting that it acts as a negative regulator of splicing and cassette exon inclusion, which contrasts with expected SR-related protein functions. 

      (3) The authors suggest an intriguing model where SFSWAP, along with other spliceosome proteins, promotes intron retention by associating with decoy exons. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) The conclusions on SFSWAP impact on alternative splicing are based on cells treated with two pooled siRNAs for five days. This extended incubation time without independent siRNA treatments raises concerns about off-target effects and indirect effects from secondary gene expression changes, potentially limiting confidence in direct SFSWAP-dependent splicing regulation. Rescue experiments and shorter siRNA-treatment incubation times could address these issues. 

      We repeated our SFSWAP knockdown analysis and analyzed both OGT e4-e5 junction splicing and SFSWAP transcript levels by RT-qPCR (now included in Sup. Fig. S4) from day 2 to day 5 post siRNA treatment. We observed that the time point at which OGT intron 4 removal increases (day 2) coincides with the time at which SFSWAP transcript levels start decrease, consistent with a direct effect of SFSWAP knockdown on OGT intron 4 splicing. Moreover, the effect of SFSWAP knockdown on OGT intron 4 splicing peaks between day 4-5, supporting our use of these longer time points to cast a wide net for SFSWAP targets.

      (2) The mechanistic role of SFSWAP in splicing would benefit from further exploration. Key questions remain, such as whether SFSWAP directly binds RNA, specifically the introns and exons (including the decoy exons) it appears to regulate. Furthermore, given that SFSWAP phosphorylation is influenced by changes in O-GlcNAc signaling, it would be interesting to investigate this relationship further. While generating specific phosphomutants may not yield definitive insights due to redundancy and also beyond the scope of the study, the authors could examine whether distinct SFSWAP domains, such as the SR and SURP domains, which likely overlap with phosphorylation sites, are necessary for regulating OGT intron 4 splicing. 

      We absolutely agree with the reviewer that the current work stops short of a detailed mechanistic study, and we have made every attempt to be circumspect in our interpretations to reflect that limitation. In addition, we are very interested in delving more deeply into the mechanistic aspects of this regulation. In fact, we have initiated many of the experiments suggested by the reviewer (and more), but in each case, rigorous interpretable results will require a minimum another year’s time. 

      For example, we have used crosslinking and biotin labeling techniques (using previously available reagents from Eclipsebio) to test whether SFSWAP binds RNA. The results were negative, but the lack of strong SFSWAP antibodies required that we use a transiently expressed myc-tagged SFSWAP. Therefore, this negative result could be an artifact of the exogenous expression and/or tagging. Given the difficulties of “proving the negative”, considerably more work will be required to substantiate this finding. As another example, we intend to develop a complementation assay as suggested. For an essential gene, the ideal complementation system employs a degron system, and we have spent months attempting to generate a homozygous AID-tagged SFSWAP. Unfortunately, we so far have only found heterozygotes. Of course, this could be because the tag interferes with function, the insert was not efficiently incorporated by homologous repair, or that we simply haven’t yet screened a sufficient number of clones. We’re confident that these technical issues that can be addressed, but they will take a significant amount of time to resolve. While we would ideally define a mechanism, we think that the data reported here outlining functions for SFSWAP in splicing represent a body of work sufficient for publication. 

      (3) Data presentation could be improved (specific suggestions are included in the recommendations section). Furthermore, Excel tables with gene expression and splicing analysis results should be provided as supplementary datasheets. Finally, a more detailed explanation of statistical analyses is necessary in certain sections. 

      We have addressed all specific suggestions as detailed in the recommendations below.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Strengths: 

      (1) Exhaustive analysis of potential splicing factors in an unbiased screen. 

      (2) Extensive genome wide bioinformatic analysis. 

      (3) Thoughtful discussion and literature survey. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) No firm evidence linking SFSWAP to an O-GlcNAc specific mechanism. 

      We couldn’t agree more with this critique. Indeed, our intention at the outset for the screen was to find an O-GlcNAc sensor linking OGT splicing with O-GlcNAc levels. As often occurs with high-throughput screens, we didn’t find exactly what we were looking for, but the screen nonetheless pointed us to interesting biology. Prompted by our screen, we describe new insights into the function of SFSWAP a relatively uncharacterized essential gene. Currently, we are testing other candidates from our screen, and we are performing additional studies to identify potential O-GlcNAc sensors.  

      (2) Resulting model leaves many unanswered questions. 

      We agree (see Reviewer 1, point 2 response).  

      Reviewer #3 (Public review): 

      Summary: 

      The major novel finding in this study is that SFSWAP, a splicing factor containing an RS domain but no canonical RNA binding domain, functions as a negative regulator of splicing. More specifically, it promotes retention of specific introns in a wide variety of transcripts including transcripts from the OGT gene previously studied by the Conrad lab. The balance between OGT intron retention and OGT complete splicing is an important regulator of O-GlcNAc expression levels in cells. 

      Strengths: 

      An elegant CRISPR knockout screen employed a GFP reporter, in which GFP is efficiently expressed only when the OGT retained intron is removed (so that the transcript will be exported from the nucleus to allow for translation of GFP). Factors whose CRISPR knockdown causes decreased intron retention therefore increase GFP, and can be identified by sequencing RNA of GFP-sorted cells. SFSWAP was thus convincingly identified as a negative regulator of OGT retained intron splicing. More focused studies of OGT intron retention indicate that it may function by regulating a decoy exon previously identified in the intron, and that this may extend to other transcripts with decoy exons. 

      We thank the reviewer for recognizing the strengths of our manuscript. 

      Weaknesses: 

      The mechanism by which SFSWAP represses retained introns is unclear, although some data suggests it can operate (in OGT) at the level of a recently reported decoy exon within that intron.

      Interesting/appropriate speculation about possible mechanisms are provided and will likely be the subject of future studies. 

      We completely agree that this is a limitation of the current study (see above). Now that we have a better understanding of SFSWAP functions, we will continue to explore SFSWAP mechanisms as suggested. 

      Overall the study is well done and carefully described but some figures and some experiments should be described in more detail. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Clarify and add missing statistical details across the figures. For example, Figure S2 lacks statistical comparisons, and in Figures 4A and 4C the tests applied should be specified in the legend. 

      We have added appropriate statistical analysis wherever missing and edited figure legends to specify the tests used.

      (2) The authors are strongly encouraged to provide detailed tables of gene expression and alternative splicing analyses from RNA-Seq experiments (e.g., edgeR, rMATS, Whippet, and MAJIQ), as this would enhance transparency and facilitate data interpretation. 

      We have added tables for gene expression and alternate splicing analysis as suggested (Suppl. tables 3-

      6).

      (3) Although the legend sometimes indicates differently (e.g., Figure 3b, 5a, 5c, etc), the volcano plots showing the splicing changes do not contain a cutoff for marginally differential percent spliced in or intron retention values. 

      The legends have been edited to reflect the correct statistical and/or PSI cutoffs.

      (4) For consistency, use a consistent volcano plot format across all relevant figures (Figures 3b, 5a-c, S3, S4, S7, and S8), including cutoffs for differential splicing and the total count of up- and down-regulated events. 

      Due to different statistical frameworks and calculations employed by different alternate splicing pipelines, we could not use the same cutoffs for different pipelines.  However, we have now indicated the number of up- and down-regulated events for consistency among the volcano plots.

      (5) What is the overlap of differentially regulated events between the different analytical methodologies applied? 

      We analyzed the degree of overlap between the three pipelines used in the paper using a Venn diagram (added to Suppl. Fig. S7). However, as widely reported in literature (e.g., Olofsson et al., 2023; Biochem Biophys Res Commun. 2023; doi: 10.1016/j.bbrc.2023.02.053.), the degree of overlap between pipelines is quite low.

      (6) To further substantiate your conclusions, additional validations of RNA-Seq splicing data, ideally visualized on an agarose gel, would be valuable, especially for exons and introns regulated by SFSWAP, and particularly for OGT decoy exons in Figure 4c. 

      We have not included these experiments as we focused on other critiques for this resubmission. Because the RNA-seq, RT-PCR and RT-qPCR data all align, we are confident that the products we are seeing are correctly identified and orthogonally validated (Figs 2d, 4a, 4b, and 4c).  

      (7) It would be more informative if the CRISPR screen data were presented in a format where both the adjusted p-value and LFC values of the hits are presented. Perhaps a volcano plot? 

      We have now included these graphs in revised Supplementary Figure S2. 

      (8) In Figure 2d, a cartoon showing primer binding sites for each panel could aid interpretation, particularly in explaining the unexpected simultaneous increase in OGT mRNA and intron retention upon SFSWAP knockdown. 

      We have added a cartoon showing primer binding sites similar to that shown in Fig. 4a.

      (9) Page 9, line 1, states that SFSWAP autoregulates its expression by controlling intron retention. Including a Sashimi plot would provide visual support for this claim. 

      The data suggesting that SFSWAP autoregulates its own transcript abundance were reported in Zachar et al. (1994), not from our own studies. Validation of those data with our RNA-seq data is confounded by the fact that we are using siRNAs to knockdown the SFSWAP RNA at the transcript level (Fig. S15). 

      (10) In the legend of Figure S2 the authors state that negative results are inconclusive because RNA knockdowns are not verified by western blotting or qRT-PCR. This is correct, but the reviewer would also argue that the positive results are also inconclusive as they are not supported by a rescue experiment to confirm that the effect is not due to off-target effects. 

      This is a fair point with respect to the siRNA experiments on their own. However, the CRISPR screen was performed with sgRNAs, and MAGeCK RRA scores are high only for those genes that have multiple sgRNAs that up-regulate the gene. Examination of the SFSWAP sgRNAs individually shows that three of four SFSWAP sgRNAs had false discovery rates ≤10<sup>-42</sup> for GFP upregulation. Thus, the siRNAs provide an additional orthogonal approach. It seems unlikely that the siRNAs, and three independent sgRNAs will have the same off-target results. Thus, these combined observations support the conclusion that SFSWAP loss leads to decreased OGT intron retention.  

      (11) For clarity in Figure 3a, consider using differential % spliced in or intron retention bar plots with directionality (positive and negative axis) and labeling siSFSWAP as the primary condition. 

      (12) Consider presenting Figure 5D as a box plot with a Wilcoxon test for statistical comparison. 

      For both points 11 and 12, we have tried the graphs as the reviewer suggested. While these were good suggestions, in both cases we felt that the original plots ended up presenting a clearer presentation of the data (see Author response image 1).

      Author response image 1.

      (13) Please expand the Methods section to detail the Whippet and MAJIQ analyses. 

      We have expanded the methods section to include additional details of the alternate splicing analysis.

      (14) Include coordinates for the four possible OGT decoy exon combinations analyzed in the Methods section. 

      We have added the coordinates of all four decoy forms in the methods section.  

      (15) A section on SFSWAP mass spectrometry is listed in Methods but is missing from the manuscript. 

      This section has now been removed.

      Reviewer #2 (Recommendations for the authors): 

      This is an excellent contribution. The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Some specific recommendations. 

      (1) The plots in Figure 3 describing SI and ES events are confusing to this reader. Perhaps the violin plot is not the best way to visualize these events. The same holds true for the histograms in the lower panel of Figure 3. Not sure what to make of these plots. 

      For Figure 3b, we include both scatter and violin plots to represent the same data in two distinct ways. For Figure 3d, we agree that these are not the simplest plots to understand, and we have spent significant time trying to come up with a better way of displaying these trends in GC content as they relate to SE and RI events. Unfortunately, we were unable to identify a clearer way to present these data. 

      (2) The model (Figure 6) is very useful but confusing. The legend and the Figure itself are somewhat inconsistent. The bottom line of the figure is apparent but I fear that the authors are trying to convey a more complete model than is apparent from this figure. Please revise. 

      We have simplified the figure from the previous submission. As mentioned above, we admit that mechanistic details remain unknown. However, we have tried to generate a model that reflects our data, adds some speculative elements to be tested in the future, but remains as simple as possible. We are not quite sure what the reviewer was referring to as “somewhat inconsistent”, but we have attempted to clarify the model in the revised Discussion and Figure legend.  

      (3) It is unclear how normalization of the RNA seq experiments was performed (eg. Figure S5 and 6).  

      The normalization differences in Fig. S5 and S6 (now Fig S8 and S9) were due to scaling differences during the use of rmats2sashimiplot software. We have now replaced Fig. S5 to reflect correctly scaled images.

      I am enthusiastic about the manuscript and feel that with some clarification it will be an important contribution. 

      Thank you for these positive comments about our study!

      Reviewer #3 (Recommendations for the authors): 

      (1) In Figure 1f, it is clear that siRNA-mediated knockdown of OGT greatly increases spliced RNA as the cells attempt to compensate by more efficient intron removal (three left lanes). However, there is no discussion of the various treatments with TG or OSMI. Might quantitation of these lanes not also show the desired effects of TG and OSMI on spliced transcript levels? 

      The strong effect of OGT knockdown masks the (comparatively modest) effects of subsequent inhibitor treatments on the reporter RNA. We have edited the results section to clarify this.

      (2) In Figure 2c, why is the size difference between spliced RNA and intron-retained RNA so different in the GFP-probed gel (right) compared with the OGT-probed gel (left)? Even recognizing that the GFP probe is directed against reporter transcripts, and the OGT probe (I think) is directed against endogenous OGT transcripts, shouldn't the difference between spliced and unspliced bands be the same, i.e., +/- the intron 4 sequence. Also, why does the GFP probe detect the unspliced transcript so poorly? 

      The fully spliced endogenous OGT mRNA is ~5.5 kb while the fully spliced reporter is only ~1.6kb, so the difference in size (the apparent shift relative to the mRNA) is quite different. Moreover, the two panels in Fig 2c are not precisely scaled to one another, so direct comparisons cannot be made. 

      The intron retained isoform does not accumulate to high levels in this reporter, a phenotype that we also observed with our GFP reporter designed to probe the regulation of the MAT2A retained intron (Scarborough et al., 2021). We are not certain about the reason for these observations, but suspect that the reporter RNA’s retained intron isoforms are less stable in the nucleus than their endogenous counterparts. Alternatively, the lack of splicing may affect 3´ processing of the transcripts so that they do not accumulate to the high levels observed for the wild-type genes. 

      (3) Please provide more information about the RNA-seq experiments. How many replicates were performed under each of the various conditions? The methods section says three replicates were performed for the UPF1/TG experiments; was this also true for the SFSWAP experiments?  

      All RNA-seq experiments were performed in biological triplicates. We have edited the methods section to clarify this.

      (4) Relatedly, the several IGV screenshots shown in Figure 3C presumably represent the triplicate RNA seq experiments. In part D, how many experiments does the data represent? Is it a compilation of three experiments? 

      Fig. 3d is derived from alternate splicing analysis performed on three biological replicates. We have added the number of replicates (n=3) on the figure to clarify this. We have also noted that the three IGV tracks represent biological replicates in the Figure legend for 3c.  

      (5) Please provide more details regarding the qRT-PCR experiments. 

      We have provided the positions of primer sets used for RT-qPCR analysis and cartoon depictions of target sites below the data wherever appropriate.

      (6) In the discussion of decoy exon function (in the Discussion section), several relevant observations are cited to support a model in which decoy exons promote assembly of splicing factors. One might also cite the finding that eCLIP profiling has found enriched binding of U2AF1 and U2AF2 at the 5' splice site region of decoy exons (reference 16). 

      Excellent point. This has now been added to the Discussion. 

      Minor corrections / clarifications: 

      (1) In the Figure 2A legend, CRISPR is misspelled. 

      Corrected.

      (2) In the discussion, the phrase "indirectly inhibits splicing of exons 4 and 5, but promoting stable unproductive assembly of the spliceosome", the word "but" should probably be "by". 

      Corrected.

    1. eLife Assessment

      The role of ACVR2A is potentially of importance to both the biology of trophoblast cells and to the pathogenesis of preeclampsia. In this manuscript, the authors have taken a useful first step towards better understanding this protein using a loss of function model in trophoblast cell lines and then examining invasion, proliferation, and transcription in these cells. The study is solid and further in vivo evidence on how target factors participate in the occurrence of placental structural disorders and diseases through potential downstream pathways will be invaluable in the future.