10,000 Matching Annotations
  1. Apr 2026
    1. Reviewer #2 (Public review):

      Summary:

      The authors are trying to broaden the understanding of SARS-CoV2 Nsp13 activity to show that a single viral protein can accomplish multiple functions. Additionally, they try to show that helicase function is not limited to ATP-driven, unidirectional unwinding.

      Strengths:

      The consistent application of statistics to triplicate experiments is a strength of the manuscript. The ToPif1 control in Figure S12 is a good control.

      Weaknesses:

      (1) All the experiments except the one in Figure S2 use N-terminally His-tagged Nsp13. Because the N-terminal tag is known to have large effects on Nsp13 activity, this calls into question virtually all of the results in this manuscript.

      (2) The ATP-independent, bidirectional duplex unwinding shown for short duplex substrates is reminiscent of the trapping of thermal fraying intermediates that have been reported for other helicases. Because they are only observed on short duplexes, do not require ATP, and are bidirectional, this does not suggest strand displacement as suggested in the manuscript. Instead, it suggests trapping of partially melted intermediates.

      (3) Results that may be artifacts of unusual in vitro conditions are interpreted as if similar results will occur in the cell, where ATP is likely always present. Along those same lines, SARS-CoV-2 replicates in compartments of the endoplasmic reticulum, which would limit the ability of Nsp13 to access DNA substrates.

      (4) There is no evidence to support the conclusion that "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms." 3'-5' duplex melting is limited to short duplexes and is ATP-independent, suggesting it may be due to trapping of thermal fraying intermediates by the ssDNA binding Nsp13. The ATP-dependent and ATP-independent melting on the substrates with the 3'-overhang are the same, suggesting that ATP-dependent melting does not occur on this substrate, which would indicate that bidirectional ATP-dependent translocation does not occur.

      (5) The description of ATP-independent unwinding as having "limited processivity," is likely not accurate. These experiments were multiturnover reactions with very high Nsp13 concentrations and no protein trap to ensure single turnover conditions. Because the reactions were multi-turnover, no information about the processivity of Nsp13 can be obtained. On the contrary, it seems likely that the product formed over the 30-minute reaction with a vast excess of Nsp13 is due to binding and dissociation of multiple Nsp13 molecules instead of processive translocation by a single enzyme.

      (6) G4s are much more stable at cellular K+ concentrations than they are at 20 mM K+. As such, Nsp13's ability to unfold a G4 in the absence of ATP may be diminished or eliminated at a physiological K+ concentration.

      Although the authors show that His-tagged Nsp13 can melt DNA and RNA duplexes and G-quadruplexes in an ATP-dependent and independent manner, in addition to annealing single-stranded nucleic acids into duplexes, the use of His-tagged Nsp13, which is known to cause artifacts, makes their results difficult to draw conclusions from. As such, in the opinion of this reviewer, this manuscript is likely to have little impact on the field.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the manuscript by Li et al., the authors perform a comprehensive study on the template and cofactor determinants of the SARS-CoV-2 nsp13 protein. They find that, alongside the classical processive unwinding ability of helicases driven by ATP consumption, other chaperone-like and ATP-independent functions exist for this enzyme. By testing DNA and RNA oligos in several conformations, the authors show that these functions are highly dependent on template identity, but also on the ratio of ATP to divalent cations. Ultimately, it is suggested that these distinct mechanisms of action are employed by nsp13 to orchestrate viral replication.

      Overall, this study provides some novel insights into the functionality of a central and conserved enzyme of a relevant human pathogenic virus. While the approach is important and adds to the field, particularly by characterizing the chaperoning activities and adding G-quadruplexes as templates, previous studies have already identified several determinants of nsp13 template binding and processing in vitro (Sommers et al., 2023, JBC; Park et al., 2025, JBC). In addition, some issues regarding experimental design need to be addressed to increase the cogency and biological relevance of the study.

      We thank the reviewer for recognizing the novelty of our work, particularly the ATP-independent chaperone-like activities and G-quadruplex remodeling. We also appreciate the opportunity to clarify the conceptual distinction between our study and the prior work by Sommers et al. (2023) and Park et al. (2025). We fully agree that those studies systematically defined the canonical ATP-driven motor mechanism of Nsp13. Our results on 5′→3′ polarity, DNA preference, and tail/ATP/Mg<sup>2+</sup> dependence align with these benchmarks, confirming the reliability of our platform.

      However, the core novelty of our work lies in revealing that Nsp13 functions as a multifaceted nucleic acid remodeler, integrating motor and non-motor activities within a single protein-a functional regime absent from the JBC papers. Specifically, we uncover three novel layers: 1. Mg<sup>2+</sup>-activated, ATP-independent remodeling of short duplexes and G-quadruplexes. 2. Bidirectional remodeling on duplexes in the Mg<sup>2+</sup>-primed state. 3. Intrinsic chaperone functions including strand annealing and stem-loop restructuring.

      Thus, our work fundamentally expands the biochemical model of Nsp13 from a simple ATP-driven motor to a multifunctional, mode-switchable remodeler. We will highlight these distinctions in the revised Discussion. Below, we respond point-by-point to the specific experimental design issues.

      (1) Generally, low concentrations of monovalent cations (20 mM), as used throughout this study, may influence helicase activity and artificially enhance protein binding/oligomerization, which could favor the observed chaperoning activity (Venus et al., 2022, Methods). In contrast, some helicases, such as HCV NS3, are inhibited by higher K+ concentrations (Gwack et al., 2004, FEBS). Thus, the influence of higher concentrations of monovalent cations should be tested in relevant assays, as intracellular K+ levels are usually >100 mM. Additionally, this could significantly affect template stability. For instance, in some G4 assays, the addition of the trap already leads to observable duplex formation (Figure 5), which may be due to low K+ conditions.

      We thank the reviewer for this critical comment regarding the ionic environment. We agree that monovalent cation concentrations are pivotal for both helicase activity and the structural stability of templates like G4s.

      First, we wish to clarify that the final NaCl concentration in our reaction is not 20 mM, as this refers only to the unwinding buffer. Our protein dilution buffer contains 200 mM NaCl, and each 10 μL reaction includes 2 μL of protein, contributing ~40 mM NaCl. With 20 mM from the reaction buffer, the final concentration reaches~60 mM. We will clarify this in the Methods.

      Second, our choice of ionic strength is guided by established literature. A survey of 27 published nsp13 studies (Author response table 1) shows that the majority use 20–50 mM monovalent cations, with 20 mM being most common. Mickolajczyk et al. (2021) showed that nsp13 activity is highest at low salt and declines at higher concentrations. Thus, low salt conditions are routinely used to capture nsp13’s intrinsic catalytic activity. The intracellular environment is far more complex, with crowding and interacting proteins that likely modulate helicase behavior. The low-salt conditions are therefore a deliberate simplification to isolate and define enzyme function.

      Planned experiments: We fully agree that higher salt concentrations should be tested. In the revision, we will perform key assays such as ATP-independent duplex unwinding and G4 unfolding at ≥100 mM NaCl or KCl to verify that the observed activities persist under more physiological ionic conditions

      (2) As in most publications that focus strictly on helicase (or other enzymatic) functions, the activity of the isolated protein is examined. However, particularly in the case of nsp13, core functions rely on other factors, such as nsp7/8 and other components of the replication-transcription complex (RTC). The overall structure and oligomerization state of nsp13 are altered within the complex (Chen et al., 2022, NSMB). The inclusion of such factors in key experiments would greatly improve the biological relevance of the findings.

      We agree that examining Nsp13 within the context of the RTC is essential for establishing the biological relevance of our findings. The structural reorganization of Nsp13 upon binding to Nsp12 and Nsp7/8 (Chen et al., 2022) suggests that its enzymatic "mode" may be regulated by its protein partners.

      Planned experiments: To address this, we will include the following biochemical characterizations:

      (1) Nsp13/12 and Nsp13/7/8 sub-complexes will be examined to dissect the individual contributions of the polymerase and the primase-like factors to Nsp13’s multifaceted activities.

      (2) The core RTC (Nsp13/12/7/8) will be used to evaluate how the full assembly modulates the functions of Nsp13 particularly on complex templates like G4 and pseudoknots.

      (3) In Figure 4, the authors claim that Mg2+ concentration inhibits RNA unwinding. While this is likely considering previous findings, it must be validated that duplex stabilization is not the primary cause for the observed lower dissociation rates. As the template is only 12 bp long with extensive overhangs, higher ion concentrations may significantly stabilize base pairing by reducing fraying effects. Similarly, in Figure 6, template-dependent effects of Mg2+/ATP should be ruled out.

      We thank the reviewer for this insightful suggestion. We agree that it is critical to distinguish whether the observed inhibition of RNA unwinding at higher Mg<sup>2+</sup> concentrations is due to the physical stabilization of the RNA duplex.

      Planned experiments: To address this, we will perform the following characterizations:

      (1) We will measure the Tm of the RNA duplex used in Figure 4 across a range of Mg<sup>2+</sup> concentrations (0, 0.5, and 1.0 mM). This will allow us to quantify the extent to which divalent cations stabilize the duplex RNA. These data will provide a more rigorous interpretation of the Mg<sup>2+</sup>-dependent unwinding in Figure 4.

      (2) Similarly, we will perform thermal melting analyses for the various DNA and RNA templates used in Figure 6 under different Mg<sup>2+</sup>/ATP conditions to rule out the template-dependent effects of Mg<sup>2+</sup>/ATP.

      (4) It is not entirely clear to me by which principle the templates were chosen. In my opinion, it would improve the overall comparability of the experimental results if, for instance, the blunt-ended duplex had the same sequence as the oligos with overhangs, since factors such as length, G/C content, Tm, etc., may play a significant role in binding and unwinding. Similarly, the oligos for binding and unwinding should be kept somewhat comparable, e.g., the G4 for the binding assay has 3 stacks, whereas RG1 has only 2. This discrepancy could make a significant difference. Thus, key experiments should be repeated using comparable sequence pairs.

      We fully agree with the reviewer that maintaining sequence consistency across different assays is essential for a rigorous comparison of nsp13 activities. We apologize for the ambiguity in the initial presentation of our sequences in Table S1.

      Planned revisions and experiments:

      (1) We wish to clarify that several key substrates were sequence-matched. For unwinding assays, the 12-bp 3′-overhang DNA and blunt-ended DNA share the identical duplex sequence, and the 16-bp 5′-overhang and 3′-overhang DNA substrates are also sequence-matched. For annealing assays, the duplex regions for all DNA substrates (3′, 5′, blunt, and fork) are identical, and the same internal consistency was maintained for all RNA annealing substrates. To make this clear, we will reorganize Table S1 to explicitly group these sequence-paired substrates.

      (2) The reviewer also notes discrepancies between binding and unwinding substrates (e.g., the difference in G4 stacks). To ensure direct comparability, we will perform additional experiments: complete binding assays for RG-1 (the 2-stack G4 used in unwinding) to match the functional data, and systematically measure binding affinities for all key unwinding substrates, including 3′-overhang, 5′-overhang, blunt-ended DNA, and the RNA fork.

      (5) Moreover, in the initial characterization of the binding abilities (Figure 1), the authors should include blunt-ended controls (duplex/hairpin) and, importantly, a pseudoknot (PK), as these structures are crucial for multiple steps in the viral life cycle (frameshifting, replication). Specifically, the PK in the 3'UTR (Sola et al., 2011, RNA Biology) may be an interesting target structure for unwinding assays, as it recruits the RTC, and, to my knowledge, no studies are available regarding nsp13 function at a PK. This would be particularly interesting in combination with nsp7/8 (Ohyama et al., 2024, JACS Au).

      We thank the reviewer for this insightful and inspiring suggestion. Incorporating pseudoknot (PK) structures into our analysis—particularly the well-characterized PK in the 3'UTR (Sola et al., 2011)—represents a significant opportunity to bridge our biochemical findings with the viral life cycle. To address this, we have designed a 3'UTR PK substrate based on recently reported scaffolds (Ohyama et al., 2024).

      Planned experiments:

      (1) We will expand our initial binding assays (Figure 1) to include blunt-ended duplexes, hairpins, and the 3'UTR PK. This will establish a baseline for how Nsp13 recognizes these structurally distinct and physiologically critical templates.

      (2) We will perform unwinding assays to determine whether Nsp13, in its isolated state, possesses the mechanical capability to resolve the complex tertiary interactions within a pseudoknot.

      (3) Following the reviewer's insight, we will examine whether the addition of nsp7/8 is required to facilitate the unfolding of the 3'UTR PK.

      Together, these experiments will allow us to assess whether Nsp13 is capable of managing one of the most challenging structural obstacles in the SARS-CoV-2 genome.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to broaden the understanding of SARS-CoV2 Nsp13 activity to show that a single viral protein can accomplish multiple functions. Additionally, they try to show that helicase function is not limited to ATP-driven, unidirectional unwinding.

      Strengths: The consistent application of statistics to triplicate experiments is a strength of the manuscript. The ToPif1 control in Figure S12 is a good control.

      We thank the reviewer for the insightful assessment and for highlighting the rigor of our experimental design, particularly our reliance on triplicate data with robust statistical validation and the inclusion of the ToPif1 control.

      We are especially grateful for the detailed comments provided by the reviewer. We fully recognize that addressing these specific points is essential for strengthening the cogency of our conclusions and improving the overall rigor of the manuscript. These suggestions have provided us with a clear roadmap for further refining our experimental evidence and clarifying our mechanistic interpretations. Below, we respond point-by-point to the specific issues.

      Weaknesses:

      (1) All the experiments except the one in Figure S2 use N-terminally His-tagged Nsp13. Because the N-terminal tag is known to have large effects on Nsp13 activity, this calls into question virtually all of the results in this manuscript.

      We thank the reviewer for raising this important concern regarding the potential influence of the N-terminal His tag on nsp13 activity. We have carefully considered this issue and provide the following lines of evidence to address it.

      (1) We have generated a tag-free nsp13 variant and our preliminary characterization (Author response image 1) shows that it retains all key activities: ATP hydrolysis (comparable to His-tagged nsp13), both ATP-independent (Mg<sup>2+</sup>-activated) and ATP-dependent unwinding, as well as chaperone activity to remodel stem-loops. These results demonstrate that while the His tag may modulate enzymatic efficiency, it does not create or abolish any specific biochemical function.

      (2) We conducted a systematic survey of 27 published studies on SARS-CoV/SARS-CoV-2 nsp13 (Author response table 1). The results show that 17 out of 27 studies (63%) used affinity-tagged nsp13 without tag removal, including His, MBP, GST, and Strep tags.

      (3) The only study that systematically compared different affinity tags (Adedeji et al., 2012) reported that GST-tagged nsp13 exhibited ~520-fold higher ATPase activity than His-tagged nsp13, demonstrating that the choice of affinity tag can affect enzymatic efficiency. However, both tagged versions retained all core enzymatic activities, including ATP hydrolysis and duplex unwinding. Importantly, no study has compared the full functional spectrum between His-tagged and tag-free nsp13. Our preliminary data suggest that the His tag may affect efficiency but does not alter the presence or absence of any specific activity.

      Planned experiments:

      We fully agree with the reviewer that a more systematic comparison would strengthen the conclusions. In the revision, we will include additional characterization of tag-free nsp13: (i) quantitative nucleic acid binding affinity, (ii) G4 unfolding efficiency, (iii) strand annealing activity. These experiments are currently underway.

      In summary, while we acknowledge that the His tag may influence enzymatic efficiency, our key conclusions are supported by experiments with tag-free nsp13. We will add a discussion of these points and include additional tag-free nsp13 data in the revised manuscript.

      (2) The ATP-independent, bidirectional duplex unwinding shown for short duplex substrates is reminiscent of the trapping of thermal fraying intermediates that have been reported for other helicases. Because they are only observed on short duplexes, do not require ATP, and are bidirectional, this does not suggest strand displacement as suggested in the manuscript. Instead, it suggests trapping of partially melted intermediates.

      We thank the reviewer for this insightful perspective. While the passive trapping of thermal fraying intermediates is a well-established model for non-catalytic protein-nucleic acid interactions, several lines of evidence suggest that nsp13 employs a more active, allosteric mechanism for ATP-independent remodeling.

      (1) If nsp13 were merely a passive trap, increasing duplex stability should decrease unwinding. However, as shown in Figure S3, raising Mg<sup>2+</sup> from 0 to 5 mM increases the DNA duplex Tm by ~10°C, yet nsp13’s remodeling activity is markedly enhanced under the same conditions (Figure 2). This positive correlation between cation-induced substrate stabilization and protein activation supports an active, protein-centered mechanism that overcomes the increased energetic barrier.

      (2) The observed bidirectionality in ATP-independent remodeling does not simply imply a lack of polarity; rather, it can reflect nsp13’s intrinsic chaperone function. In the absence of ATP, nsp13 binds the ss/ds junction (Figure 2F) and, in a Mg<sup>2+</sup>-dependent manner, may use its binding energy to actively intercalate into the duplex. This mechanism is inherently symmetric for 3′ and 5′ overhangs, explaining bidirectional remodeling, while the absence of activity on blunt-ended substrates confirms the requirement for a pre-existing junction.

      (3) The lack of activity on 24-bp substrates does not negate this remodeling mode but defines its energetic boundary. The binding energy released upon nsp13-nucleic acid interaction is sufficient to overcome the lower unwinding barrier of 12-16 bp duplexes, but insufficient to counteract the high stability and rapid re-annealing of a 24-bp duplex without the continuous mechanical power of ATP hydrolysis.

      Planned Revision:

      We thank the reviewer for prompting us to refine our mechanistic model. In the revision, we will add a dedicated discussion explicitly comparing the model of allosterically activated, binding-driven strand intrusion with the passive trapping model, incorporating the Tm data to strengthen our conclusions.

      (3) Results that may be artifacts of unusual in vitro conditions are interpreted as if similar results will occur in the cell, where ATP is likely always present. Along those same lines, SARS-CoV-2 replicates in compartments of the endoplasmic reticulum, which would limit the ability of Nsp13 to access DNA substrates.

      We thank the reviewer for raising this important concern regarding the physiological relevance. We fully agree that in vitro conditions do not entirely recapitulate the complex intracellular environment, and we have been careful not to over-interpret our findings. Below we address the two specific issues raised:

      (1) Regarding the ATP-independent activity, we acknowledge that ATP is abundant in healthy, actively replicating cells. However, during rapid viral replication, local ATP concentrations can fluctuate due to the high energy demand of the RTC as the template contains extensive secondary structures, which may lead to transient ATP depletion. Under such energy-limited conditions, Yu et al. (2025) demonstrated that ADP-bound nsp13 exhibits chaperone activity that destabilizes nucleic acid structures without ATP hydrolysis, and Dumm et al. (2025) reported that SARS-CoV-2 nsp13 resolves RNA stem-loops in an ATP-independent manner.

      Even when ATP is abundant, the ATP-independent mode may enable rapid, local structural adjustments that bypass the kinetic delay of ATP binding and hydrolysis. As shown in Figure 1D, nsp13 exhibits high binding affinity for structured nucleic acids. In this scenario, nsp13 functions not as a processive motor but through a binding-driven mechanism, using the free energy of protein-nucleic acid interaction to transiently destabilize short duplexes or resolve local secondary structures such as G4s and stem-loops in an energy-efficient manner.

      (2) Regarding DNA substrates, we fully agree that RNA is the physiological substrate for nsp13. However, DNA is a validated and widely accepted surrogate for mechanistic studies because DNA is more stable and easier to manipulate than RNA to yield the mechanistic insights. A systematic survey of 27 published nsp13 studies (Author response table 1) shows that 20 out of 27 (74%) used DNA substrates for at least some of their experiments. In our study, we used DNA primarily as a mechanistic probe and a stable control, and we validated all key conclusions on physiological RNA substrates, as shown in Figures 4, 5, 6, S7, S8, S10, S11 and S12.

      Planned revisions: To address the reviewer’s concerns more directly, we will revise the manuscript to include a discussion paragraph explicitly stating that the ATP-independent activity was observed under optimized in vitro conditions and may represent a latent remodeling capability that could be relevant under energy-limited conditions such as local ATP depletion during rapid replication. We will also clarify that DNA substrates were used as mechanistic probes and controls, and that all key findings were validated on physiological RNA substrates. We thank the reviewer for prompting us to strengthen the discussion of these important points.

      (4) There is no evidence to support the conclusion that "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms." 3'-5' duplex melting is limited to short duplexes and is ATP-independent, suggesting it may be due to trapping of thermal fraying intermediates by the ssDNA binding Nsp13. The ATP-dependent and ATP-independent melting on the substrates with the 3'-overhang are the same, suggesting that ATP-dependent melting does not occur on this substrate, which would indicate that bidirectional ATP-dependent translocation does not occur.

      We are grateful to the reviewer for this critical evaluation of our mechanistic claims. We agree that our initial statement regarding bidirectional ATP-dependent remodeling was imprecise and not fully supported by the data. As the reviewer correctly notes, the similar unwinding efficiency on 3′-overhang substrates regardless of ATP presence indicates that ATP hydrolysis does not drive 3′→5′ translocation, which is consistent with nsp13’s known 5′→3′ motor polarity. The observed 3′→5′ activity is therefore more accurately described as an ATP-independent remodeling event, not ATP-dependent unwinding.

      We will revise the Discussion and relevant Results sections to clarify the nature of this bidirectional activity. Specifically, the sentence:

      "Duplex DNA supports bidirectional remodeling via both ATP-dependent and ATP-independent mechanisms..."will be corrected to: "Duplex DNA supports bidirectional remodeling via ATP-independent mechanisms."

      We will also explicitly state that while nsp13 requires ATP for long-range, processive 5'→3' helicase activity, its remodeling/chaperone function is inherently bidirectional and powered by the free energy of binding to the ss/ds junction, rather than by ATP-driven mechanical work.

      (5)-The description of ATP-independent unwinding as having "limited processivity," is likely not accurate. These experiments were multiturnover reactions with very high Nsp13 concentrations and no protein trap to ensure single turnover conditions. Because the reactions were multi-turnover, no information about the processivity of Nsp13 can be obtained. On the contrary, it seems likely that the product formed over the 30-minute reaction with a vast excess of Nsp13 is due to binding and dissociation of multiple Nsp13 molecules instead of processive translocation by a single enzyme.

      We thank the reviewer for this important correction. We fully agree that our use of the term "processivity" was technically imprecise. Processivity strictly defines the distance a single enzyme translocates during one binding event, which our multi-turnover assays (with high nsp13 concentrations and no protein trap) were not designed to measure. Our results specifically demonstrate that the ATP-independent remodeling mode is highly sensitive to duplex length, with efficiency declining sharply as the duplex lengthens. To reflect the experimental data more faithfully, we have replaced "processivity" with more accurate descriptors throughout the manuscript.

      Planned revisions:

      (1) Original: "The ATP-independent unwinding mode, however, has limited processivity." Revised: "The ATP-independent unwinding mode, however, exhibits a steep decline in efficiency as the duplex length increases."

      (2) Original: "...an ATP-independent, cation-activated mode with limited processivity." Revised: "...an ATP-independent, cation-activated mode specialized for localized structural remodeling"

      (3) Original: "...primes Nsp13 for basal strand remodeling but supports only limited processivity." Revised: "...primes Nsp13 for basal strand remodeling but is insufficient for the sustained unwinding of extended duplexes."

      (4) Original: "...primes Nsp13 for low-processivity strand displacement." Revised: "...primes Nsp13 for short-range strand displacement rather than long-range processive unwinding."

      We believe these changes clarify that the ATP-independent mode acts as a molecular chaperone for local obstacles (like G4 or short stems) rather than a motor for long-range translocation. We thank the reviewer for helping us improve the precision of our description.

      (6) G4s are much more stable at cellular K+ concentrations than they are at 20 mM K+. As such, Nsp13's ability to unfold a G4 in the absence of ATP may be diminished or eliminated at a physiological K+ concentration.

      We thank the reviewer for this critical point regarding physiological ion concentrations. We agree that K<sup>+</sup> significantly stabilizes G4 structures, which may raise the energy barrier for ATP-independent remodeling.

      Planned experiments:

      To address this, we will perform salt titration assays (up to 150 mM KCl) to evaluate the robustness of nsp13’s G4 unfolding activity under more physiological ionic conditions. We will also measure the melting temperature of our G4 substrates across this K<sup>+</sup> range to correlate structural stability with enzymatic efficiency.

      Author response image 1.

      Preliminary characterization of tag-free Nsp13 enzymatic activities. (A) Comparison of ATPase activity between His-tagged and tag-free Nsp13 in the presence of ssRNA or RNA G4. (B) Raw fluorescence data from stopped-flow FRET analysis of ATP-dependent unwinding (16-bp fork DNA, 2 mM Mg<sup>2+</sup>, 2 mM ATP). F/F<sub>0</sub> represents FAM fluorescence normalized to initial DNA intensity. (C) ATP-independent DNA duplex remodeling (data reproduced from Figure S2). (D) Chaperone activity of tag-free Nsp13 on DNA and RNA stem-loops.

      Author response table 1.

      Summary of affinity tags, monovalent salt concentrations, and substrate types used in 27 published SARS-CoV/SARS-CoV-2 nsp13 studies

      References:

      (1) Ivanov KA, Thiel V, Dobbe JC, van der Meer Y, Snijder EJ, Ziebuhr J. Multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase. J Virol. 2004 Jun;78(11):5619-32.

      (2) Lee NR, Kwon HM, Park K, Oh S, Jeong YJ, Kim DE. Cooperative translocation enhances the unwinding of duplex DNA by SARS coronavirus helicase nsP13. Nucleic Acids Res. 2010 Nov;38(21):7626-36.

      (3) Adedeji AO, Marchand B, Te Velthuis AJ, Snijder EJ, Weiss S, Eoff RL, Singh K, Sarafianos SG. Mechanism of nucleic acid unwinding by SARS-CoV helicase. PLoS One. 2012;7(5):e36521. doi: 10.1371/journal.pone.0036521.

      (4) Adedeji AO, Lazarus H. Biochemical Characterization of Middle East Respiratory Syndrome Coronavirus Helicase. mSphere. 2016 Sep 7;1(5):e00235-16.

      (5) Jia Z, Yan L, Ren Z, Wu L, Wang J, Guo J, Zheng L, Ming Z, Zhang L, Lou Z, Rao Z. Delicate structural coordination of the Severe Acute Respiratory Syndrome coronavirus Nsp13 upon ATP hydrolysis. Nucleic Acids Res. 2019 Jul 9;47(12):6538-6550.

      (4) Jang KJ, Jeong S, Kang DY, Sp N, Yang YM, Kim DE. A high ATP concentration enhances the cooperative translocation of the SARS coronavirus helicase nsP13 in the unwinding of duplex RNA. Sci Rep. 2020 Mar 11;10(1):4481.

      (5) Shu T, Huang M, Wu D, Ren Y, Zhang X, Han Y, Mu J, Wang R, Qiu Y, Zhang DY, Zhou X. SARS-Coronavirus-2 Nsp13 Possesses NTPase and RNA Helicase Activities That Can Be Inhibited by Bismuth Salts. Virol Sin. 2020 Jun;35(3):321-329.

      (6) Mickolajczyk KJ, Shelton PMM, Grasso M, Cao X, Warrington SE, Aher A, Liu S, Kapoor TM. Force-dependent stimulation of RNA unwinding by SARS-CoV-2 nsp13 helicase. Biophys J. 2021 Mar 16;120(6):1020-1030.

      (7) Chen J, Wang Q, Malone B, Llewellyn E, Pechersky Y, Maruthi K, Eng ET, Perry JK, Campbell EA, Shaw DE, Darst SA. Ensemble cryo-EM reveals conformational states of the nsp13 helicase in the SARS-CoV-2 helicase replication-transcription complex. Nat Struct Mol Biol. 2022 Mar;29(3):250-260.

      (8) Yazdi AK, Pakarian P, Perveen S, Hajian T, Santhakumar V, Bolotokova A, Li F, Vedadi M. Kinetic Characterization of SARS-CoV-2 nsp13 ATPase Activity and Discovery of Small-Molecule Inhibitors. ACS Infect Dis. 2022 Aug 12;8(8):1533-1542.

      (9) Corona A, Wycisk K, Talarico C, Manelfi C, Milia J, Cannalire R, Esposito F, Gribbon P, Zaliani A, Iaconis D, Beccari AR, Summa V, Nowotny M, Tramontano E. Natural Compounds Inhibit SARS-CoV-2 nsp13 Unwinding and ATPase Enzyme Activities. ACS Pharmacol Transl Sci. 2022 Apr 1;5(4):226-239.

      (10) Lu L, Peng Y, Yao H, Wang Y, Li J, Yang Y, Lin Z. Punicalagin as an allosteric NSP13 helicase inhibitor potently suppresses SARS-CoV-2 replication in vitro. Antiviral Res. 2022 Oct;206:105389.

      (11) Yue K, Yao B, Shi Y, Yang Y, Qian Z, Ci Y, Shi L. The stalk domain of SARS-CoV-2 NSP13 is essential for its helicase activity. Biochem Biophys Res Commun. 2022 Apr 23;601:129-136.

      (12) Grimes SL, Choi YJ, Banerjee A, Small G, Anderson-Daniels J, Gribble J, Pruijssers AJ, Agostini ML, Abu-Shmais A, Lu X, Darst SA, Campbell E, Denison MR. A mutation in the coronavirus nsp13-helicase impairs enzymatic activity and confers partial remdesivir resistance. mBio. 2023 Aug 31;14(4):e0106023.

      (13) Yu J, Im H, Lee G. Unwinding mechanism of SARS-CoV helicase (nsp13) in the presence of Ca2+, elucidated by biochemical and single-molecular studies. Biochem Biophys Res Commun. 2023 Aug 6;668:35-41.

      (14) Sommers JA, Loftus LN, Jones MP 3rd, Lee RA, Haren CE, Dumm AJ, Brosh RM Jr. Biochemical analysis of SARS-CoV-2 Nsp13 helicase implicated in COVID-19 and factors that regulate its catalytic functions. J Biol Chem. 2023 Mar;299(3):102980.

      (15) Maio N, Raza MK, Li Y, Zhang DL, Bollinger JM Jr, Krebs C, Rouault TA. An iron-sulfur cluster in the zinc-binding domain of the SARS-CoV-2 helicase modulates its RNA-binding and -unwinding activities. Proc Natl Acad Sci U S A. 2023 Aug 15;120(33):e2303860120.

      (16) Marx SK, Mickolajczyk KJ, Craig JM, Thomas CA, Pfeffer AM, Abell SJ, Carrasco JD, Franzi MC, Huang JR, Kim HC, Brinkerhoff H, Kapoor TM, Gundlach JH, Laszlo AH. Observing inhibition of the SARS-CoV-2 helicase at single-nucleotide resolution. Nucleic Acids Res. 2023 Sep 22;51(17):9266-9278.

      (17) Inniss NL, Rzhetskaya M, Ling-Hu T, Lorenzo-Redondo R, Bachta KE, Satchell KJF, Hultquist JF. Activity and inhibition of the SARS-CoV-2 Omicron nsp13 R392C variant using RNA duplex unwinding assays. SLAS Discov. 2024 Apr;29(3):100145.

      (18) Sales AH, Fu I, Durandin A, Ciervo S, Lupoli TJ, Shafirovich V, Broyde S, Geacintov NE. Variable Inhibition of DNA Unwinding Rates Catalyzed by the SARS-CoV-2 Helicase Nsp13 by Structurally Distinct Single DNA Lesions. Int J Mol Sci. 2024 Jul 19;25(14):7930.

      (19) Soper N, Yardumian I, Chen E, Yang C, Ciervo S, Oom AL, Desvignes L, Mulligan MJ, Zhang Y, Lupoli TJ. A Repurposed Drug Interferes with Nucleic Acid to Inhibit the Dual Activities of Coronavirus Nsp13. ACS Chem Biol. 2024 Jul 19;19(7):1593-1603.

      (20) Hao W, Hu X, Chen Q, Qin B, Tian Z, Li Z, Hou P, Zhao R, Balci H, Cui S, Diao J. Duplex Unwinding Mechanism of Coronavirus MERS-CoV nsp13 Helicase. Chem Biomed Imaging. 2024 Dec 19;3(2):111-122.

      (21) Park J, Jeong YJ, Chauhan K, Koh HR, Kim DE. ATPase-dependent duplex nucleic acid unwinding by SARS-CoV-2 nsP13 relies on facile binding and translocation along single-stranded nucleic acid. J Biol Chem. 2025 Jul;301(7):110373.

      (24) Yu J, Im H, Cho H, Jeon Y, Lee JB, Lee G. A novel ADP-directed chaperone function facilitates the ATP-driven motor activity of SARS-CoV helicase. Nucleic Acids Res. 2025 Jan 24;53(3):gkaf034.

      (25) Dumm AJ, Zheng AY, Butler TJ, Kulikowicz T, George JC, Bombard PT, Sommers JA, Ding J, Brosh RM Jr. SARS-CoV-2 point mutations are over-represented in terminal loops of RNA stem-loop structures that can be resolved by Nsp13 helicase in a unique manner with respect to nucleotide dependence. Nucleic Acids Res. 2025 May 22;53(10):gkaf447.

      (26) Castro JM, Slack RL, Ong YT, Zhang H, Gifford LB, Courouble VV, Aiken RM, Shankar V, O'Leary TR, Griffin PR, Lan S, Du Y, Fu H, Sarafianos SG. Stalling the Enemy: Targeting Nsp13 for Next-Generation SARS-CoV-2 Antivirals. Int J Mol Sci. 2026 Mar 11;27(6):2587.

      (27) Mingroni MA, Enney BM, Malsick LE, Geiss BJ. Motif V is an allosteric couple between the SARS-CoV-2 nsp13 nucleotide triphosphatase and helicase active sites. J Biol Chem. 2026 Mar;302(3):111198.

    1. eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

    2. Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantages of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

    4. Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

    5. Author response:

      eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

      On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.

      Public Reviews:

      Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

      We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing[35].”

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:

      Results, lines 142 ff.:

      “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings[39].”

      Methods, lines 463-465:

      “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”

      Supplementary Tables:

      Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”

      Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”

      For clarity, in Author response image 1 we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column Raw count of parasite developmental category per image and experiment)

      Author response image 1.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:

      Results, lines 211 ff.:

      “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”

      Methods, lines 520 ff.:

      “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”

      Figure 4 legend, lines 830 ff.:

      “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”

      We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      We thank the reviewer for pointing this out; we have now replaced “schistosomes” with “Schistosoma mansoni” (current line 131)

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

      This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing “EdU pulse-chase” with “EdU pulse” experiments in lines 37, 204, and 321.

      Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

      Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

    1. eLife Assessment

      In this work, the authors demonstrated that blue light mediated mitochondrial contacts attenuated blue light induced mitochondrial dysfunction, and validated this in human cells and C. elegans. This valuable work has the potential to provide novel perspectives into the field of mitochondrial biology but the supporting data are incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Blue light exposure has been shown to induce mitochondrial dysfunction, including reduced mitochondrial membrane potential (MMP). In the present study, the authors present a protein-based optogenetic system capable of inducing mito-contacts upon blue LED illumination, and show that this technical platform attenuated blue-light-induced mitochondrial dysfunction and cytotoxicity via restoring mitochondrial membrane potential.

      Strengths:

      The overall study design is well organized, and the data appear to support the conclusions. Additionally, demonstrating effects in human retinal cells and C. elegans enhances the perceived robustness and translational potential of the findings.

      Weaknesses:

      (1) Quantification of MMP at contact sites: The use of Rhodamine 123 (Rh123) for MMP measurement can be problematic, as it is not ratiometric; its signals depend on loading conditions, cell size, mitochondrial mass, and focal thickness, rather than solely on ΔΨm. If mitochondrial content changes (e.g., via biogenesis or mitophagy), Rh123 readings can be misleading. This is particularly relevant here, as the mito-contact-induced MMP changes appear to be localized events. The authors should include controls for at least one experiment using FCCP/CCCP (to collapse ΔΨm) and oligomycin (to induce hyperpolarization in many cell types) to confirm the dynamic range of the assay. Where possible, Rh123 fluorescence intensity should be normalized to mitochondrial mass (e.g., using a mass marker or mitochondrial protein). Moreover, MMP changes should be validated using an alternative indicator, such as JC-1 or a genetically encoded probe, as this is foundational to the study.

      (2) Mechanisms of mito-contact-induced MMP hyperpolarization: Building on the above, what is the mechanism by which mito-contacts induce MMP hyperpolarization? Does this involve fusion of the outer or inner mitochondrial membranes? MMP hyperpolarization typically reflects an increase in protons in the intermembrane space relative to the matrix. Where do these protons originate? The kinetics of mito-contact-induced MMP changes should also be investigated in more detail.

      (3) Building on the above, what is the ratio of contact area to the overall mitochondrial surface area? If MMP increases only at relatively small contact sites, how does this translate to an overall increase in MMP and energy production?

      (4) Blue light causes mitochondrial damage via increased reactive oxygen species (ROS), and MMP hyperpolarization can itself lead to excessive oxidative stress. The authors should measure ROS levels and discuss their potential impact on the observed effects.

      (5) Although the main focus is on blue LED-mediated injury, the protective effects of the optogenetic system against other stressors (e.g., ischemia-reperfusion, H₂O₂, or FCCP exposure) should be examined. This would help exclude confounds related to blue light, which is central to both the manipulation and the damage model in the current study, and increase the overall impact of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      This paper describes a novel tool (CRYO2PHR-MiroTM), which aims to create contact sites between mitochondria. One elegant aspect of the technique is that it is controlled by the exposure of cells to blue-light and reversible when cells are put back in the dark. Through an unknown and unexplored mechanism, the mitochondrial membrane potential is raised at the mitochondrial contact sites. The oligomerization of CRYOPHR-MiroTM is protective against the toxic effect of prolonged blue light exposure in cells and nematodes.

      Strengths:

      This work might open novel perspectives in the fundamental study of mitochondria.

      (1) CRYO2PHR-MiroTM represents an interesting tool to manipulate mitochondria interaction/proximity/distribution without playing with the classical components of the mitochondrial fusion and fission machinery.

      (2) This work suggests that, without the need for fusion, the relative proximity of mitochondria might influence their activity, opening novel fields of investigation in mitochondrial biology.

      (3) Finally, targeting CRYO2PHR not only to mitochondria but also to their partner organelles (ER, LD, peroxisomes...) could provide a tool to reversibly manipulate the interaction of mitochondria with the rest of the organelle community.

      Weaknesses:

      As detailed below, the claims made by the author that CRYOPHR induce mitochondrial contact sites are not fully convincing at this stage. The method used to define and analyse contact sites is not clear enough, and the image presented in the present manuscript does not convincingly illustrate contact sites between mitochondria. Finally, the evidence that CRYOPHR does not trigger mitochondrial fusion should be strengthened.

      Comments on the results:

      (1) The quantification of mitochondrial contacts is a crucial point of this study. At this stage, the data are not sufficient to demonstrate that CRYOPHR-MiroTM oligomerisation tethers mitochondria. CRYOPHR-MiroTM can oligomerise in Trans, leading to mitochondrial tethering, but it can also oligomerise in Cis. In that later case, one could hypothesise that the massive aggregation of CRYOPHR-MiroTM at the mitochondrial outer membrane could locally push lipids away and/or create membrane curvature. The image and quantification provided by the author make it difficult to decide whether CRYOPHR-MiroTM tethers mitochondria or pinches their membranes. Below are detailed comments on these aspects:

      a) It is claimed that "the proportion of mitochondria having one or more mito-contacts increased by nearly 50% following optogenetic stimulation". However, it is unclear how the authors have calculated this parameter. In the methods for contact ratio calculation, it is written that "the contacted area of CRY2PHR puncta was calculated", but I do not understand what it means and how it relates to contact ratio calculation. Then the authors have written, "Based on the area or distance (between mitochondria), the mitochondria were classified as either non-contact or contact". It is not clear to which parameter the term " area " refers: the area of mito-contacts based on MitoTracker or the area of CRY2PHR puncta. It is not clear how the authors integrate the two parameters "area" and "distance" to decide whether two mitochondria are in contact or not.

      b) The method states that "Contact ratio refers to the number of contact mitochondria by the total number of mitochondria". What does "number of contact mitochondria" mean? The number of contacts between mitochondria? The number of mitochondria in contact? What is the distance range between two mitochondria, taking into account optic resolution, for which the authors consider that two mitochondria are "in contact"?

      c) The quantification of the contact ratio made on the TEM picture should be explained.

      d) The following data should be added, as contact site formation is a critical point. On cells treated or not with blue light, the author should measure systematically what is the distance of a given mitochondrion to the nearest one. The distribution of these distance values should be shown and analysed to determine whether or not there are more mitochondria at short distances upon blue light induction of CRYOPHR oligomerization. In addition, the author should determine the number of CRYO2PHR puncta that are simply lying on a mitochondrion and the number of CRYO2PHR puncta that are bridging two clear, distinct mitochondria.

      e) Based on the images provided in Figure 1, there is no convincing evidence of mitochondrial contacts. In image 1g, the CRYO2PHR puncta seem to be lying on mitochondrial tubules. Sometimes, it looks that CRYO2PHR puncta decorate mitochondrial constriction sites, suggesting that the CRYOPHR might pinch membranes. The authors claim that they "found various types of mitochondrial contacts (Figure 1f, 1g), such as head-to-head, side-by-side, and head-to-side", but it is not clearly visible on the images. One problem is that the authors show the merge of MTDR and CRYOPHR-mCherry staining, in which the mitochondria contact are hidden by very bright CRYOPHR-mCherry aggregates. The authors should provide high magnification images (like in 1g) showing not only the merge of mitochondria and CRYOPHR-mCherry but also the staining of mitochondria by themselves. The authors should mark "head-to-head, side-by-side, and head-to-side contacts" with arrows.

      f) Continuing on Figure 1f and 1g, it does not sound optimal to use CRYOPHR-mcCherry in combination with MTDR (MitoTracker Deep Red) to precisely delimitate subtle membrane contact sites between mitochondria because the emission and excitation spectra of these two fluorochromes partially overlap. One better alternative could be to use MTG (MitoTracker Green) as for Figure 1a. However, here we come to the point that MitoTraker stains the mitochondrial matrix that is delimited by the mitochondrial inner membrane, which can be discontinuous in a given mitochondrion. To formally visualise mitochondrial contact sites and demonstrate that CRYOPHR tethers mitochondria, the author should rather mark the mitochondrial outer membrane (with TOM20::GFP and anti-TOM20, for instance).

      g) Figure S2 presents snapshots of a movie clearly showing the rapid aggregation of CRYOPHR into distinct puncta upon blue light exposure. The author should perform the same experiment on cells in which mitochondria would be stained with a fluorophore, allowing live imaging (MTG or TOM20::GP, for instance). This would allow for tracking of mitochondria and CRYOPHR puncta at the same time. Hence, high magnification views should allow for capturing events where CRYOPHR puncta formation coincides with mitochondrial tethering if the authors' claims are correct, or with, for instance, membrane pinching if they are wrong.

      h) If CRYOPHR-TMMiro bring mitochondrial membrane closer, it would be surprising that it does not increase the probability of Mitofusin-dependent fusion events. The author should conduct analysis of the mitochondrial network in cells exposed to the conditions shown in Figure 1. Rather than relying only on the aspect ratio (as shown in Figure 2 in cells stressed by prolonged blue light exposure), the author should also analyse the mitochondrial total branch length (sum of the length of all branches from a mitochondrion) and the number of branches on each mitochondrion.

      i) Ideally, the author should not only rely on the analysis of mitochondrial architecture, which only partially informs on mitochondrial fusion rate. Fragmented mitochondria can indeed fuse efficiently via kiss-and-run events, for instance. To formally demonstrate that there are no permanent nor transcient fusion at the mitochondrial contact sites induced by CRYOPHR, the most powerful method would be to analyse diffusion of matrix fluorescent dyes. This can be conducted using photoconvertible probes (mt-dendra2) (Pham et al., 2012) or a PEG-induced cell fusion assay (Detmer et al., 2007).

      (2) Regarding the quantification of local MMP at mitochondrial contact, it would be important to better explain how the authors have set up their microscope to avoid technical issues that could lead to fluorescent artifacts at CRYOPHR puncta. Because the emission of Rhodamine 123 overlaps the excitation of mCherry, it should be explained in the methods how the detection of Rhodamine 123 has been filtered to avoid the detection of the red light coming from the mCherry light coming from CRYOPHR puncta. This is critical as fluorescent protein aggregates can be very bright.

      Comments on the introduction and discussion

      (1) In the results section, the authors state that they were "Inspired by previous studies indicating that nanoscale proximity of a charged membrane or protein 119 condensate to a membrane amplifies the local membrane potential". It could be useful to the readers to have a bit of background regarding these observations (references 55 and 56) to better understand what supports the rationale of the authors' strategy. Then, the discussion part should address in more detail the possible mechanisms that could explain why bringing the mitochondrial membranes without fusing them influences mitochondrial membrane potential.

      (2) I would suggest finding a simple name for the CRYOPHR-MiroTM tool that could evoke more clearly that it is an optogenetic tool designed to tether mitochondria with blue light.

    1. eLife Assessment

      This study provides potentially important insights by establishing a human disease model and exploring therapeutic approaches. The evidence is generally convincing for descriptive and comparative findings. The authors present solid data, but evidence for proposed biological mechanisms and functional outcomes remains limited.

    2. Reviewer #1 (Public review):

      In this study, the authors set out to develop a human disease model using stem cell-derived systems and to use this platform to investigate disease biology and evaluate potential therapeutic approaches. Their goal is to provide a tractable experimental system that captures key features of the disease and enables testing of candidate interventions.

      The work has several important strengths. The authors present a carefully constructed model with improved genetic replication and clearer reporting of biological replicates, which enhances confidence in the reproducibility of the findings. The longitudinal design, spanning early developmental stages to later disease-relevant phenotypes, provides a useful framework for distinguishing temporal aspects of the disease process. The study also includes a comparative evaluation of multiple therapeutic strategies adding practical value to the field. In addition, statistical reporting and transparency have been strengthened, and key limitations of the model-such as the absence of certain cell types-are now clearly acknowledged.

      At the same time, notable weaknesses temper the strength of the conclusions. Several central biological claims, particularly those related to specific signaling pathways, are supported primarily by transcriptomic and protein-level observations without direct functional validation. Similarly, measures used to interpret cellular processes do not fully distinguish between alternative biological explanations, leaving some mechanistic interpretations unresolved. The therapeutic findings are supported by biochemical changes, but evidence for functional recovery at the cellular level is limited. These gaps mean that some of the broader conclusions should be interpreted with caution.

      Overall, the authors have largely achieved their aim of establishing a useful experimental model and demonstrating its potential for studying disease-related changes and testing interventions. The evidence is convincing for the descriptive and comparative aspects of the work, but more limited for mechanistic and functional claims.

      The study is likely to have a meaningful impact by providing a platform that others in the field can build upon. The methods and datasets will be useful to researchers interested in disease modeling and therapeutic development. At the same time, the work is best viewed as an important foundation, with key mechanistic and functional questions remaining to be addressed in future studies.

    3. Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted, and these have been adequately addressed in the revision.

      Comments on revisions:

      I have no further recommendations. The revised manuscript addresses the few questions and concerns that I had initially shared.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments on revisions:

      I have no further concerns regarding this manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both, the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends.

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a].

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located16kd downstream of GBA1, which shares 9698% sequence similarity with GBA1) (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”.

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi: 10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c].

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes / experiments - Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C-H. Kwon, B. Kaur, M. Frederick, S. Thornton, L. Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 1-7. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead.

      We agree that assessment of off-target expression and potential cytotoxicity for AAV is important, this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c].

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in methods as described below:

      For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cutoffs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group). ns, not significant.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH<sup>+</sup> neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus).

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013). https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipids analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH<sup>+</sup> neurons, GFAP<sup>+</sup> glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain).

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

      Recommendations for the authors:

      Reviewing Editor Comments:

      Your paper has been reviewed by three expert reviewers in the GBA field. Although they appreciate the work and its novelty, they raise several concerns. We suggest that you to address these concerns in the next version.

      Reviewer #1 (Recommendations for the authors):

      Statistical and presentation issues

      (1) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (1)- b].

      (2) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated methods to describe the Statistical analysis details [see response to Reviewer 1 Weaknesses (5)-b].

      (3) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (5)- c].

      (4) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      Our RNA-seq data were generated from a single batch of MLOs, with mapping rates exceeding 90%. The GEO accession will be made publicly available upon publication.

      Reviewer #2 (Recommendations for the authors):

      Please consider the following suggestions for revisions:

      (1) Line 86: A bit more explanation/justification for the focus on midbrain-like organoids would be helpful, including introducing the nature of the midbrain pathology to better put some of the MLO findings in context. Is the nGD pathology for the midbrain significantly different / out of proportion to other affected brain regions?

      nGD Patients often display impaired vertical gaze and movement disorders. These symptoms correlate with midbrain involvement due to the sensitivity of this region to neuroinflammatory and degenerative processes (Ref #7, #8). Both human and mouse studies indicate that the midbrain exhibits prominent substrate accumulation compared to other brain regions, suggesting a predisposition for greater pathological involvement in GD midbrain (Ref #8, #9, #10, #11). This rationale was added to Line 86 in the revision.

      References:

      (7) Goker-Alpan O, Ivanova MM. Neuronopathic Gaucher disease: Rare in the West, common in the East. J Inherit Metab Dis.(2024) 47(5):917-934. PMID: 38768609.

      (8) Burrow TA, Sun Y, Prada CE, Bailey L, Zhang W, Brewer A, Wu SW, Setchell KDR, Witte D, Cohen MB, Grabowski GA. CNS, lung, and lymph node involvement in Gaucher disease type 3 after 11 years of therapy: clinical, histopathologic, and biochemical findings. Mol Genet Metab. (2015) 114(2):233-241. PMID: 25219293.

      (9) Tamar Farfel-Becker, Einat B. Vitner, Samuel L. Kelly, Jessica R. Bame, Jingjing Duan, Vera Shinder, Alfred H. Merrill, Kostantin Dobrenis, Anthony H. Futerman. Neuronal accumulation of glucosylceramide in a mouse model of neuronopathic Gaucher disease leads to neurodegeneration, Human Molecular Genetics, (2014). Volume 23, Issue 4, Pages 843–854.

      (10) E. Ellen Jones, Wujuan Zhang, Xueheng Zhao, Cristine Quiason , Stephanie Dale, Sheerin Shahidi-Latham, Gregory A. Grabowski, Kenneth D. R. Setchell, Richard R. Drake, and Ying Sun. High-Resolution MALDI Imaging Mass Spectrometry. SLAS Discovery (2017). Vol. 22(10) 1218–1228

      (11) Xu YH, Xu K, Sun Y, Liou B, Quinn B, Li RH, Xue L, Zhang W, Setchell KD, Witte D, Grabowski GA. Multiple pathogenic proteins implicated in neuronopathic Gaucher disease mice. Hum Mol Genet. (2014) 23(15):3943-57. PMID: 24599400.

      (2) Lines 359-360: Please specify the carbon-chain length of the sphingoid base of the GluCer species analyzed. Also, is there a citation for the statement that 18:0 and 16:0 are "brain-enriched species"?

      The carbon-chain length analyzed ranges from 14:0 to 24:0. The sphingoid base for all GluCer species analyzed is d18:1. For example, the species referred to as GluCer 18:0 corresponds to GluCer(d18:1/18:0). Although both, 16:0 and 18:0 are enriched in the brain, 18:0 is the most abundant species in the brain (Ref #12, #13). We revised "brain-enriched species” to “brain-predominant species (18:0)”.

      References:

      (12) Nilsson, O., and Svennerholm, L. Accumulation of Glucosylceramide and Glucosylsphingosine (Psychosine) in Cerebrum and Cerebellum in Infantile and Juvenile Gaucher Disease. Journal of Neurochemistry (1982) 39, 709–718.

      (13) Sun, Y., Zhang, W., Xu, Y.H., Quinn, B., Dasgupta, N., Liou, B., Setchell, K.D., and Grabowski, G.A. Substrate compositional variation with tissue/region and Gba1 mutations in mouse models--implications for Gaucher disease. PLoS One (2013). 8, e57560.10.1371/journal.pone.0057560.

      (3) Figure 2: It would be interesting to compare the MLO findings to prior gene expression data. Are there previously published transcriptome analyses from nGD brain tissue (or other tissues) that the transcriptome data obtained from MLOs may be compared with? What about transcriptome analyses of mouse GD models?

      We thank the reviewer for this valuable suggestion. To strengthen the biological context of our transcriptomic findings, we have added a new comparative table (new Supplementary Table 3) in the revised manuscript that summarizes key dysregulated pathways in our human nGD MLOs alongside previously published data from nGD mouse midbrain (Ref#14). The table highlights substantial overlap, including axon guidance, neuron differentiation, dopaminergic/glutamatergic/GABAergic synaptic signaling, lipid metabolism, apoptosis/cell death, and nervous system development, emphasizing the translational relevance of our model. We also note that our dataset uniquely reveals pronounced dysregulation of WNT signaling and anterior-posterior patterning (Fig. 2L and 2M), potentially reflecting human-specific early midbrain defects.

      We added the following sentence to Discussion: “Comparative analysis with prior transcriptomic data from nGD mouse midbrain showed consistent dysregulation in axon guidance, synaptic signaling, lipid metabolism, and nervous system development (new Supplementary Table 3), supporting the fidelity of our human MLO model.”

      Reference:

      (14) Dasgupta N, Xu YH, Li R, Peng Y, Pandey MK, Tinch SL, Liou B, Inskeep V, Zhang W, Setchell KD, Keddache M, Grabowski GA, Sun Y. Neuronopathic Gaucher disease: dysregulated mRNAs and miRNAs in brain pathogenesis and effects of pharmacologic chaperone treatment in a mouse model. Hum Mol Genet. (2015) 24(24):7031-48. PMID: 26420838.

      (4) Lines 402-405 & Figure 3D: Is it possible to include a merged image to better visualize the TH and FOXA2 co-staining / potential colocalization?

      The merged images of TH (red) and FOXA2 (green) are shown in Fig. 3E. Yellow arrows indicate TH and FOXA2 co-stained cells, which appear yellow in the merged images. The results demonstrate that the number of co-stained cells is reduced in GD2-1260 MLOs compared with WT-75.1 MLOs at both, week 6 and week 8.

      (5) Lines 447-448 & Figure 4F, G, J: It would be helpful to provide a direct analysis/visualization of MLO size between the WT-75.1, GD2-1260, and iso-GD2-1260 genotypes (allowing direct comparison of WT and iso). Similarly, the same 3-way analysis would be valuable for assessing dopamine levels.

      We have included WT-75.1 in Fig. 4 F/G/J in the revision. All three genotypes, WT-75.1, GD2-1260, and iso-GD2-1260, are presented for analysis compared to WT-75.1. In new Figure 4F, MLO growth is presented by representative MLO images taken under wide field microscopy at day 2, Wk4 and Wk8 of differentiation. In new Fig. 4G, MLOs size was analyzed by NIS elements and presented as the area (µm<sup>2</sup>) of MLO in image (mean ± SEM). N≥10 MLOs were analyzed for each genotype. In new Fig. 4J. Dopamine levels in MLO culture medium from WT-75.1, GD2-1260 and iso- GD2-1260 MLOs at Wk12 cultured in 3 mL BGM medium for 72 hours were analyzed. Data are presented as mean ± SEM (n = 5 per group). Statistical analysis applied was described in the legend.

      (6) Figure 4: What is the explanation/interpretation of the residual autophagy pathway dysfunction in CRISPR-corrected MLOs? nGD requires near-complete loss of GCase activity, so it is a bit curious that autophagic dysfunction would be observed with only ~50% GCase reduction? There is some discussion, but it doesn't fully capture the unexpected nature and implications of this result.

      This phenomenon may be explained by a threshold effect in lysosomal function. Gaucher disease is an autosomal recessive disorder. The carriers with heterozygous GBA1 mutation, who retain approximately 50% of normal GCase activity, do not develop disease. This suggests that even partial restoration of GCase activity can reduce glucosylceramide accumulation below a pathological threshold, thereby restoring lysosomal integrity and autophagic flux. In addition, improved GCase activity may help normalize the lipid composition of lysosomal membranes, facilitating the fusion events required for effective autophagy.

      (7) Lines 512-516 & Figure 5J: The data shown are inconclusive. Can these Western blot data be quantified, noting the number of replicates for each measurement? Without quantification and statistics, it is difficult to assess the claim that levels of LAMP1, LC3-I, LC3-II, 4E-BP1, and p-4E-BP1 in GD2-1260 treated with SapC-DOPS-fGCase are more similar to GD2-1260 treated following SapC-DOPS than to WT-75.1.

      We performed quantitative analysis by comparing WT-75.1 and included the data in new Fig. 5J. The result was revised as:

      Analysis of protein levels showed that decreased LAMP1 expression in GD2 1260 MLOs was not altered following SapC DOPS fGCase treatment (Figure 5J). The elevated LC3-II levels, an indicator of impaired autophagic flux, were reduced upon treatment, suggesting enhanced autophagic activity (Figure 5J). Moreover, phosphorylated 4E-BP1 (Thr37/46), but not total 4E-BP1, was improved in SapC-DOPS-fGCase–treated MLOs, reflecting a decrease in mTOR hyperactivation (Figure 5J). We anticipate that a longer duration of SapC-DOPS-fGCase exposure in nGD MLOs may produce a more robust therapeutic effect in rescuing nGD-associated phenotypes, which will be evaluated in future studies.

      (8) Lines 518-520: The presented data support "effective restoration of GCase activity," but clarification is needed regarding "correction of GD-related disease phenotypes." Perhaps "selected molecular and biochemical phenotypes" would be more accurate. Data are not shown for several other phenotypes, including TH, FOXA2, and dopamine levels.

      This was revised to “selected molecular and biochemical phenotypes “.

      (9) Figure 5D-J: Please clarify whether all experiments were conducted 48 hours after treatment, as indicated for Figure 5C. If so, does this suggest that SapC-DOPS treatment exhibits only short-term effects? Were any data collected to evaluate the persistence of the treatment effect?

      The treatment duration is specified in the Fig. 5 legend. Fig. 5D–J represent experiments conducted after two weeks of treatment, whereas Fig. 5C reflects a 48-hour treatment. In both Gaucher disease lines, two-week treatment restored GCase activity to wild-type levels and reduced GluSph substrate accumulation. These findings were intended as proof-of-principle to demonstrate therapeutic feasibility; evaluation of treatment persistence beyond two weeks was beyond the scope of this study.

      Minor suggestions

      (1) Line 80: "A brain organoid derived from hiPSCs of a healthy individual with GBA1 knockout and α-synuclein overexpression exhibited some PD features23." I would suggest enumerating what "PD features" are to distinguish from "clinical features", which I don't think is the intended meaning.

      This was revised as “exhibited characteristic PD markers”.

      (2) Figure 2I: The reported number of downregulated DEGs is incorrect. It should be 765, not 1429.

      This was corrected in Figure 2I.

      (3) Line 359: change "enrich" to "enriched".

      This word was corrected.

    1. eLife Assessment

      This study presents a valuable methodological contribution exploiting the DEER background decay to quantify supramolecular packing in amyloid fibrils. The evidence is incomplete: the observation of D < 1 is inconsistent with the theoretical lower bound of the model, and it remains unclear whether this reflects a genuine systematic limitation or falls within experimental uncertainty.

    2. Reviewer #1 (Public review):

      Summary:

      Proteins' misfolding into amyloid fibrils is the hallmark of neurodegenerative disorders. Tau fibrils, in particular, exhibit subtle structural variations that distinguish different pathologies. Understanding the mechanism of amyloid formation requires structural characterization, usually done by NMR or cryo-EM, and insights into fibril packing order and homogeneity remain limited.

      Here, the authors exploit DEER echo decays of singly spin-labeled proteins to quantify packing order. While DEER is most used to measure intramolecular distances between two spin labels within a single protein, it also provides access to intermolecular distance distributions through the so-called background decay. This background decay has been theoretically described and can be used to characterize the spatial distribution of spins in terms of local spin concentration and the dimensionality of their arrangement. In the case of singly labeled proteins, the DEER signal contains only this intermolecular information. The authors propose using the extracted dimensionality as a reporter of packing disorder along the fibril axis and demonstrate this approach on the tau protein.

      The background decay follows an exponential form with a time constant proportional to alphaD, where D is the dimensionality of the spin distribution and ranges from 1 to 3. For a homogeneous frozen solution of singly spin-labeled proteins, D = 3, and alpha is proportional to pbCL, where pb is the probability of changing the orientation of the spins excited by the DEER pump pulse, and CL is the local spin concentration. In a homogeneous system, CL equals the spin bulk concentration. The parameter pb is instrument-dependent and can be experimentally determined. When 𝐷<3, alpha takes a more complex form (given by Eq. 3), but remains linear C with a pre-factor that depends on 𝑝𝑏 and a defined function of D. For known C and pb, a plot of alpha vs C yields a linear curve, the slope of which can be used to determine D.

      This approach was applied to the tau fragment tau187, labeled with a nitroxide spin label at positions 272C, 313C, 322C, and 404C. DEER measurements were performed on mixtures of labeled and unlabeled proteins at different ratios, and D was determined. DEER measurements were performed on mixtures of labeled and unlabeled protein at varying ratios to determine D. Fibril formation was induced by heparin, and the resulting decrease in D was monitored over time, reaching a final value of ~1.5. The authors find that the final dimensionality (D) is reached within 12 minutes and is independent of concentration. Consistent values of D ≈ 1.5 are observed for residues 272C, 313C, and 322C located in the protein core, whereas residue 404C, positioned in the C-terminal "fuzzy" region, yields a higher value of D ≈ 2.

      Comparisons across tau variants show that heparin-induced fibrils of longer constructs are mispacked, whereas shorter tau fragments form well-ordered, seeding-competent fibrils with lower conformational variability. Seeded aggregation further improves templating and packing, as indicated by reduced dimensionality. Finally, the authors demonstrate that the local spin density derived from the α parameter can be used to estimate the number of protofilaments.

      With the method now established, its application to other amyloid systems may reveal correlations between fibril packing order and disease-related properties.

      Strengths:

      This study presents an original, conceptually clear method for quantifying fibril packing using a single parameter (dimensionality). The approach is experimentally accessible and straightforward to analyze, making it broadly applicable with standard pulse EPR instrumentation.

      Weaknesses:

      A discussion about the meaning of D<1 is missing. In addition, the treatment of multi-protofilament fibrils is limited. In particular, it remains unclear how increases in dimensionality arising from multiple protofilaments start to affect D and how it can be distinguished from packing disorder.

    3. Reviewer #2 (Public review):

      This manuscript by Tsay et al. reports an EPR (electron paramagnetic resonance) approach based on double electron electron resonance spectroscopy (DEER) to characterize the supramolecular packing of amyloid fibrils. The authors claim that this approach can "deliver an apparent dimensionality of the supramolecular organization of tau fibrils", "assess the amyloid core location and packing order, and track time-resolved formation of aggregation intermediates".

      Specifically, the authors used the electron spin echo (ESE) decay to report the arrangement of spin labels in the amyloid fibrils. When the spin labels are arranged in a straight line, a planar surface, or a 3D space, the dimensionality of the ESE decay would be 1, 2, and 3, respectively. To demonstrate their methods, the authors used a singly spin-labeled tau protein, which is involved in several amyloid diseases, including Alzheimer's and other tauopathies. For the truncated 0N4R tau (residues 244-441, named tau187), four labeling sites were studied (272, 313, 322, and 404). Residues 272, 313, and 322 gave a dimensionality of ~1.5, while residue 404 gave a dimensionality of ~2.0. The authors explained that residues 272, 313, and 322 are expected to be part of the amyloid core, while 404 is part of the so-called fuzzy coat. However, the authors then explained that all three amyloid core sites are misaligned because their dimensionality is ~1.5 instead of 1. Using a short tau fragment of 16 amino acids (residues 295-313), the authors show that this peptide formed fibrils with a dimensionality of 0.8. Using the short tau fragment fibrils as seeds, the authors obtained tau187 fibrils with a dimensionality of 1.3. Furthermore, the α parameter (a fitting parameter used to obtain the dimensionality) was used to interpret the protofilament composition.

      While this approach has great potential in providing structural insights into amyloid fibrils, there are several critical flaws in experimental design, data analysis, and interpretation in the current version.

      (1) The authors didn't rigorously establish the central premise of the DEER approach to characterize the supramolecular structure of amyloid fibrils. The parallel in-register β-sheet structure of amyloid fibrils is supposed to give a dimensionality of 1 in the ESE decay analysis. For tau187 fibrils, the authors obtained 1.5. For tau16 fibrils, the authors obtained 0.8. Because the theoretical lower limit of dimensionality is 1, tau16 fibrils do not serve as evidence that this approach can identify a perfectly aligned parallel in-register β-sheets. A 20% deviation from the theoretical value suggests the low accuracy of using ESE decay to report amyloid core structures. The high-resolution structures of tau fibrils have been widely reported using cryo-EM methods; it shouldn't be difficult for the authors to identify a good protein candidate to obtain a dimensionality of 1 to establish their methods. With a good protein candidate, rigorous data analysis should be presented to show how reliable a core site can be distinguished from a supposedly disordered site.

      (2) Regarding the claim of probing protofilament composition using the α parameter, the authors should prepare fibrils with defined protofilament composition. A number of amyloid fibril structures have been solved to show different numbers of protofilaments.

      (3) Regarding the claim of tracking "time-resolved formation of aggregation intermediates", the authors need to show more than a couple of data points, and the real-time aggregation needs to be accompanied by characterizations with complementary methods such as TEM.

      (4) The authors largely ignored progress that has been made on the previous spin labeling studies of amyloid fibrils. A lot of the claims, such as identifying amyloid core, real-time aggregation, and the effects of seeding on structures, have been characterized extensively using continuous-wave EPR. It would be to the benefit of the readers to show what additional values this approach provides over existing methods.

    4. Reviewer #3 (Public review):

      In this work, Tsay et al. examine the challenge of inferring the ordering of amyloid fibrils. There is a clear need for such methodology. In their work, they computationally analyze the case of the expected decay in the DEER signal for spins randomly distributed in one, two, and three dimensions and show that (not unexpectedly) the decay is sensitive to dimensionality for a range of spin label concentrations. More intriguingly, they measure the dimensionality of tau amyloid labeled at several positions. Intriguingly, they show uniform (but unexpected) dimensionality when the label is in the fibril core. Through further simulations, they show that this anomalous dimensionality cannot arise from label attraction or repulsion (which can lead to deviations from random positions). Instead, this dimensionality is interpreted (again using compelling simulations) to arise from mis-registering due to changes in packing. Taken together, this paper convincingly shows that the DEER signal can be used to get site-specific information on amyloid dimensionality and can discriminate between regions of fibril core vs the "fuzz coat". Overall, this paper moves forward the methodology and opens up the technique to attractive applications in the areas of amyloid formation. More substantively, the field of DEER has been fixated on the dipolar modulation, and it is only once in a while now that one comes across a paper with a fresh breath of air - this paper certainly is!

    1. eLife Assessment

      In this valuable study, the authors develop new approaches to investigate mRNA imprinting, a phenomenon in which RNA-protein complexes form in the nucleus to influence the fate of transcripts in the cytoplasm. They propose that the Pol II subunit Rpb4 serves as a key node in this pathway, recruiting proteins involved in cytoplasmic processes. Notably, some of the candidates identified in this study were previously thought to function exclusively in the cytoplasm. However, the evidence remains incomplete, as key controls are lacking and alternative explanations have not been fully addressed; additional validation would help strengthen the authors' conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      To understand the process of mRNA imprinting, the authors develop a series of unbiased methods to identify and follow proteins that associate with transcripts co-transcriptionally. The methods rely on RNA polymerase II pull-downs or proximity biotinylation to do so, and from these experiments, the authors identify some interesting candidate proteins, including Rpg1 / eIF3a, Ssa1/2, and Spt6. The authors characterize some of these proteins in follow-up experiments and show that Spt6 recruitment depends on Rpb4.

      Strengths:

      (1) The methods described in this study will be useful for the community beyond their immediate application.

      (2) The topic of mRNA imprinting remains an open area in the field, and this paper provides hypothesis-generating datasets that may be of use.

      (3) If correct, the idea that eIF3a binds co-transcriptionally would be of interest to the transcription and translation fields.

      (4) The data showing the importance of Rpb4 for Spt6 binding are some of the strongest.

      Weaknesses:

      (1) Two main methods (PROFIT and BioPROFIT) are introduced in this study, both of which make use of a combination of tags, especially on RNA polymerase II subunits, to identify and track proteins that are potentially recruited co-transcriptionally. However, a more thorough characterization is needed to gain a sense of the false negatives and false positives. For instance, there are no direct experiments testing the requirement for transcription for the hits. This is a key experiment.

      (2) Alternatives are also not robustly considered. For example, what is the evidence that the proteins remain bound to an RNA through its life cycle, as opposed to rebinding in the cytoplasm? For proteins with known cytoplasmic functions, like Rpg1/eIF3a, this conclusion needs more supporting evidence. This caveat is especially important to consider given the typical or known off-rates of many of these proteins.

      (3) Showing direct evidence that biotinylated "target" proteins (like eIF3a) accumulate in the nucleus during short labeling or if nuclear export is blocked is an important control, as is an experiment inhibiting transcription and demonstrating that the signal decreases.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have provided valuable and solid evidence for the hypothesis, of which Choder is an early advocate, that transcription facilitates the assembly of an mRNA-protein complex that can affect the expression of mRNA (e.g., translation or degradation) in the cytoplasm.

      Strengths:

      In this work the authors have used two orthogonal approaches: an IP of a Flag labeled Pol II and RNAse digestion to release nascent chain associated proteins followed by mass spectrometry to identify cotranscriptional-associated proteins and then verifying this association with the transcriptional apparatus by proximity labeling technology using biotinylation of a specific sequence (Avi-tag) by the bacterial enzyme, BirA fused to a subunit of Pol II. Many of the proteins identified are thought to be exclusively cytoplasmic, for instance, those important for translation, such as the components of initiation factor EF3. The work represents a significant advance in support of the model where specific mRNAs can assemble proteins needed for their function in the cytoplasm during their transcription.

      They also discover that a mutant Pol II subunit, Rbp4, which does not bind certain Avi-tagged proteins, does not facilitate their biotinylation. These results lend credible support to the hypothesis.

      Weaknesses:

      While the proximity labeling provides strong evidence that is consistent with the hypothesis, a proof is still lacking because it is inferred that the enzymatic labeling occurs at the site of transcription (a reasonable assumption). More definitive evidence could be provided by imaging the presence of the cytoplasmic proteins at the transcription site, although this may not be within the expertise of the investigator, so it would require a collaboration.

      While not necessarily a significant weakness, it is worth considering that a remote possibility is that the cytoplasmic proteins discovered in this way were not tagged with biotin in the nucleus, but rather in the cytoplasm, where the Pol II-complex, either Flag or BirA tagged, may come in contact with the substrate before it is imported to the nucleus. The authors presumably rule out that the tagging could occur during translation of the Avi-tag on polysomes by inhibiting translation and showing that the tagging of the target protein is not inhibited (the data here is not totally convincing). Whether the Pol II-(BirA or Flag) could react with Avi-tagged proteins, even while briefly in the cytoplasm before nuclear import, is not completely resolved by these experiments since the Avi-tagged proteins could reside in the cytoplasm, not associated with polysomes, but complexed with Pol II subunits. The mutant Rpb does not rule out this possibility since it would not bind its substrate in the cytoplasm. In order to get into the nucleus in the first place, the cytoplasmic proteins would need to be transported there by a complex, possibly involving Pol II subunits, Rpbs. Perhaps the authors could address this possibility in the text.

      One confusing issue in the protocol is the efficacy of the biotin-depleted media in which the cells are grown. Biotin is an essential cofactor for many reactions, so there are still endogenous biotin and biotin ligase needed that may add a background level of promiscuous biotinylation of some cytoplasmic proteins, for instance, those containing a universal biotin binding site.

    4. Reviewer #3 (Public review):

      Summary:

      Various groups over the last several decades have provided many examples of proteins associating with nascent mRNA co-transcriptionally to influence gene expression at subsequent stages, including in the cytoplasm. In this and previously published works, the Choder group has described these events as "mRNA imprinting", which we know as a field that reflects the differential association of proteins with mRNAs in a gene-specific or environmentally induced fashion to regulate gene expression.

      In this study, the authors use a proteomics-based approach termed PROFIT to identify factors associated with RNA Pol II in an RNA-dependent manner. The identified interactors have the potential to be part of mRNA-protein complexes (mRNPs) being formed co-transcriptionally with an "mRNA imprinting" function. PROFIT employs a pulldown of RNA Pol II via a tagged Rpb3 subunit, followed by RNase I-mediated elution to isolate proteins associated in an RNA-dependent manner. Proteomics analyses identified known mRNA-associated proteins that have previously been reported as imprinting factors, as well as other proteins involved in gene expression, including factors functioning in the cytoplasm. The authors suggest, based on the RNA-dependence and assumed formation of these interactions with RNA Pol II co-transcriptionally, that these novel hits could be mRNA imprinting factors. Although for most of these factors, it has not been determined whether they associate with RNA-Pol II in the context of transcription with nascent transcripts to contribute to the downstream regulations of these transcripts.

      Strengths:

      PROFIT successfully identified nuclear factors known to engage mRNA co-transcriptionally. This suggests that the method has the potential to detect imprinting factors. By employing a proximity-labeling technique, termed BioPROFIT, further evidence is provided for some of the novel interactors being in proximity to RNA Pol II. The authors further demonstrate that one of the factors, the eIF3 component Rpg1, exists in two fractions, with a soluble fraction that matures into a ribosome fraction, which is suggestive of Rpg1 traveling along the gene expression pathway with an mRNP to be engaged in translation. In addition, the authors showed that PROFIT detects changes in RNA Pol II associated factors in response to heat shock, consistent with gene expression reprogramming during stress. As such, these methods and proteomics data provide a starting point for a more detailed characterization of mRNP compositions formed in the nucleus and their impact on gene expression at later stages.

      Weaknesses:

      The authors interpret the interaction data from PROFIT and BioPROFIT under the assumption that this reflects interactions happening co-transcriptionally. There is no discussion of other ways these data may result, or more importantly, controls to prove these assumptions are true. Overall, these assays lack important controls and experimental validations by independent methods to demonstrate that the identified interactions occur co-transcriptionally within the nucleus and do not represent interactions occurring in the cytoplasm or artifacts related to experimental design. For example, the authors focus on Rpg1 as a potential imprinting factor, which would require this protein to shuttle and be localized at transcribing genes. Yet no evidence is presented that Rpg1 enters the nucleus or can be found in association with a transcribed gene, which leaves open the possibility that this interaction is occurring in the cytoplasm or forming post-lysis.

      To the possibility of in vitro interactions, in the PROFIT assay, yeast collected from a 3L culture is cryo-ground and resuspended in 7 mL of lysis buffer. This ratio of cell material to buffer will create a highly concentrated cell lysate that is subsequently used over ~6.5 hours, which is the time for centrifugation, DNase I digestion, and immunoprecipitation. These conditions have a very high probability of promoting new interactions between RNA, RNA Poll II, other proteins, and/or RNA Pol II-associated nascent RNA complexes in vitro. Notably, the PROFIT assay detects many highly expressed proteins but does not identify many of the factors known to be loaded into nuclear mRNPs (e.g., Yra1, THO complex, Sub2, or Nab2). The BioPROFIT assay is used to try to address this issue, but biotinylation may occur post-lysis because the desalting process to remove biotin is performed just before the immunoprecipitation, providing ~2 hours for the reaction to happen in vitro. In addition, even if the biotinylation occurs in cells, nothing about this assay indicates this is occurring in the context of transcribing RNA Pol II or nascent transcripts. To address this major issue, the authors should add a mixing control to show that the detected interactions between RNA Pol II and the identified factors are produced in cells, not in the cell lysate. Specifically, mixing cell grindates from two independent yeast strains (e.g., RPB3-FLAG strain mixed with a TIF4631-HA strain) with the lysate used in the PROFIT assay with western blotting. In this case, if the interaction is detected, the interaction is produced in the cell lysate. To verify PROFIT hits associated with transcribing RNA Pol II and nascent transcripts, BIOPROFIT should be performed in cells treated with a transcription inhibitor (e.g., thiolutin) or mutants blocking transcription by Pol II. These types of verifications should be performed for the multiple novel hits reported in the manuscript.

      Another in vitro issue must also be addressed. In the PROFIT assay, elution of RNA-associated factors from the immunoprecipitated material is performed by RNase I digestion, but the reaction time is very long (3 hours) at room temperature. During such a long incubation time and at higher temperature (i.e., above 4 Celsius), it is possible that non-RNA-mediated interactors dissociate from the beads and/or protein binding partners. This possibility is made more problematic by the fact that the authors define interactors using fold change over an Rpb3 no tag sample, where the sample does not contain isolated RNA Pol II complexes and their associated protein-binding partners. As such, even a small amount of non-RNA-mediated RNA Poll II interactors that elute would appear significantly enriched. For this point, a comparison of +/- RNase I elution in the Rpb3-FLAG pulldown sample should be performed using PROFIT.

      Other points to address:

      (1) The cartoon in Figure 1A should be corrected to present the PROFIT experiment as described in the text. Specifically, in the cartoon, UV is shown to be applied to cells, but this is done with cell grindate.

      (2) The cartoon in Figure 2A should be corrected. In the cartoon, it shows the biotin ligase biotinylating proximal proteins during DNase digestion as well as on the Sepharose beads, but in theory, the majority of the biotinylation reaction occurs in cells. In addition, the cartoon depicts biotinylation of proximal proteins, but the system described uses wild-type BirA to specifically biotinylate an Avi-tag. To perform non-specific labeling of proximal proteins, BirA* would need to be used. Finally, the cartoon indicates mass spectrometry analysis of labeled proteins, but this is not done in the manuscript.

      (3) In the text, the sentence "However, no bio-Spt6-Avi was released from the complexes containing Pol II mutants (Fig. 5C)" appears to have two errors. "Pol II mutants" should likely be "rpb4 mutant" and "Fig. 5C" is probably "Fig. 6C".

      (4) In the Figure 6 legend, the sentence "The bulk Spt6 was detected by anti-HIS Abs that bound to (HIS)x6, which was placed upstream of the FLAG" suggests that "FLAG" should be "Avi-tag." Please correct it if necessary and accurately describe it in the strain list.

      (5) On page 18, Npl3 is listed and discussed, but never mentioned anywhere prior in the paper. For example, the paragraph states "...our observation that it binds nascent RNA in an Rpb4-dependent manner...", but Npl3 is not listed in the supplemental Table 4, which lists PROFIT hits affected by rpb4∆. If Npl3 is to be discussed, the associated data needs to be properly presented.

    1. eLife Assessment

      This valuable study by Zhu et al. offers a high-resolution evolutionary framework for spider silk proteins (spidroins) through long-read transcriptomics across a broad phylogenetic range, with theoretical implications for protein family evolution, biomaterials, and silk biology. By identifying putative ancestral spidroin templates in early-diverging spiders, the authors make a significant contribution to understanding genetic innovations underlying silk diversification. The long-read sequencing approach is well-suited to these highly repetitive genes. However, the support is incomplete: key claims regarding direct ancestry between silk protein families, the independent origin of certain silk types, and the co-option of flagelliform spidroins in non-web-building spiders rely on absence-based inferences and indirect phylogenetic reasoning that the data cannot yet fully substantiate, and some gene family assignments overreach the available molecular evidence.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Zhu et al. address spider silk spidroin evolution using long-read transcriptomics across 12 spider species. The study provides a novel evolutionary framework for spidroin diversification, proposing the existence of two ancient ancestral templates, i.e., AS and GS, and tracing how these templates diversified into major spidroin classes observed in radiated spiders. The manuscript further focused on the evolutionary history of multiple known spidroin proteins, with some previous hypotheses being revised.

      Strengths:

      A major challenge in silk biology, the highly repetitive content, was well addressed in this study by full-length transcriptome sequencing. Also, the authors performed very detailed analyses on sequence features across a wide range of species. I therefore think the study is supported by sound levels of sampling, technology, and analysis.

      Weaknesses:

      The manuscript presents a lot of detail regarding various sequence features and derived claims, but these features are sometimes not friendly to an audience not working with spider silks. Also, the current figures are not very helpful for understanding those described patterns. I found many colorful, trivial elements in almost every figure, but how their organization supported the corresponding statement was often unclear to me. I recommend that the authors further improve the figure design, including presenting a schematic evolutionary history for those spider silk proteins.

    3. Reviewer #2 (Public review):

      Summary:

      This paper utilizes long-read transcriptomics across 12 representative spider species to propose a new evolutionary framework for spider silk proteins (spidroins). By identifying ancestral templates in the most basal spider lineages, the authors trace how simple genetic materials diversified into the high-performance fibers used by modern spiders.

      Strengths:

      (1) The authors utilized PacBio ISO-Seq (long-read transcriptomics), which is essential for resolving the massive, highly repetitive sequences of spidroin genes that often cause gaps in traditional short-read assemblies.

      (2) The researchers sampled 12 species representing the major nodes of spider evolution, including the basal Mesothelae, the Mygalomorphae (tarantulas), and the highly diverse Araneomorphae.

      (3) The study successfully identified two distinct primordial spidroins in basal spiders: the AS-type (alanine-serine-rich) and the GS-type (glycine-serine-rich) proteins.

      Weaknesses:

      (1) The GS-Type "Base Gene" Paradox

      The paper proposes that the GS-type gene (Liphistius sp._5400) in Liphistius (the most ancient spider lineage) is the prototype for all modern dragline silk. However, the data presented significantly undermines this conclusion.

      Every functional spider silk protein requires N-terminal and C-terminal domains to control fiber assembly. The authors admit that neither the N- nor the C-terminal of this GS-type protein shows homology to any known spidroins. Because it lacks these domains, the authors explicitly state that it "may not assemble into typical silk fibers". The authors are identifying this as a "base gene" solely because it contains poly-GS motifs. Their logic is that because GS motifs are found in modern silk and other silk-producing insects, this must be the ancestor.

      In the same spider, the AS-type gene (Liphistius sp._6705) does have recognizable C-terminal sequences and motifs similar to modern eggcase silk. This proves that "real" spidroins existed in Liphistius, making the claim that the non-homologous GS-type is a "spidroin ancestor" look like a misidentification of a general repetitive protein.

      (2) Overstated Classification of FLAG in RTA Spiders

      The authors identified a transcript in the RTA spider Heteropoda davidbowie (H.dav_6495) and labeled it a "Flag-like spidroin". This label is based on the repetitive internal motifs, which contain "GPGGX" and "GPG"-the classic building blocks of flagelliform capture silk. However, both the N- and C-termini of this gene are highly homologous to ampullate spidroins (MaSp), not typical Flag proteins. By calling it a "Flag-like spidroin" rather than a "MaSp with GPG motifs," the authors are forcing an evolutionary narrative. It is equally possible that this is simply a divergent Major Ampullate spidroin that evolved capture-like motifs, rather than a capture silk gene that "moved" into the ampullate gland.

      The authors explicitly state, "Its origin could not be traced through sequence analysis". This admission directly contradicts the confidence with which they propose a "revised evolutionary trajectory".

      Appraisal and Impact

      This study provides a high-resolution map of spider silk evolution by utilizing long-read transcriptomics to bridge the gap between basal and derived lineages. By identifying the earliest known genetic templates for silk, the paper offers a significant leap forward in understanding how complex biological materials originate, though it raises critical questions about the functional definition of a "spidroin".

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Zhu et al. use long-read transcriptomes, with correction using short-read RNA-seq, from 12 spider species that span the major evolutionary lineages to investigate the diversification of spider silk proteins (spidroins). Here, they identify 60 spidroin sequences and propose that two highly divergent sequences found in the basal Liphistius sp., where one is an alanine-serine-rich (AS-type), and one is a glycine-serine-rich (GS-type), represent ancestral templates from which all major spidroin families diversified. Using separate phylogenetic analyses for N-terminal domains, C-terminal domains, and repetitive domains, the authors argue that the AS-type lineage remained relatively conserved and gave rise to tubuliform spidroins (TuSp) used in eggcase silk, while the GS-type lineage evolved into minor ampullate spidroins (MiSp) and may have provided the substrate for major ampullate spidroins (MaSp). In addition, they describe a specific flagelliform-like (flag) transcript in a basal clade spider, with MaSp-like terminal domains, and propose that Flag was co-opted into ampullate silk glands before being progressively lost in more derived retrolateral tibial apophysis (RTA) lineages.

      Strengths:

      The taxon sampling is a strength of this study, covering representative species at key nodes across spider evolution, from the earliest-diverging Mesothelae through Mygalomorphae and into the most derived Araneomorphae lineages, which enables the authors to make comparative inferences about ancestral states. Also, the use of long-read sequencing is well-suited to the problem since spidroin genes contain highly repetitive coding sequences that would be very hard to resolve by short-read assembly alone. Thus, retrieving 30 full-length sequences in this context is notable, and the assembly quality appears reasonable for transcriptomic resources, with BUSCO completeness values reported between 85% and 93% across species.

      The decision to analyse N-terminal, C-terminal, and repetitive domains in separate phylogenetic trees is methodologically sound and yields a biologically interesting result: terminal domains show greater diversification in basal lineages than repetitive regions, suggesting that specialisation of silk gland microenvironments preceded compositional innovation in the repetitive sequences.

      Weaknesses:

      While the paper has strengths in providing a useful comparative resource and generating interesting hypotheses, several of the central evolutionary conclusions are not directly supported by the current data. There are three main elements that require further attention:

      (1) The GS-type Liphistius sequence (Liphistius sp._5400) is central to the manuscript's model for the origin of GA-rich ampullate spidroins, but the authors describe it as a spidroin-like transcript whose N- and C-terminal regions lack homology to known spidroins and may not support typical silk-fiber assembly. Since its terminal domains are excluded from the phylogenetic analyses, the proposed scenario, GS-type to MiSp to MaSp, rests largely on repeat-region similarity. Supplementary materials provided in this study further indicate no predicted signal peptide, although this feature alone is not unique among the annotated silk proteins. The manuscript should therefore either provide a stronger justification for treating Liphistius sp._5400 as an ancestral spidroin or more consistently frame it as a spidroin-like, repeat-based intermediate. The distinction between repeat-region clustering and full functional homology should also be made explicit.

      (2) The whole-body transcriptome approach is an important sampling limitation that is acknowledged here, where the authors note that they were unable to recover complete spidroin repertoires for each species. Because the newly generated data are not silk-gland-specific, the absence of a transcript in a given species should be interpreted with caution and not equated directly with gene absence. This is particularly relevant to the manuscript's proposed loss of Flag during RTA evolution. In the focal taxa, the inference combines one positive transcript in H. davidbowie with non-detection in H. diardi, while broader support comes from limited synteny-based absence in a small number of external genomes. Therefore, while the Flag-loss scenario could be plausible, it remains suggestive rather than conclusive without more targeted silk-gland sampling or broader genomic validation.

      (3) The Flag co-option model is interesting, but as presented now, it is based on limited evidence: a single Flag-like transcript in H. davidbowie, the absence of detection in H. diardi, restricted synteny comparisons, and terminal-domain similarity to ampullate spidroins. The manuscript does not present proteomic evidence that this Flag-like protein is incorporated into ampullate silk fibers, nor does it show a series of pseudogenized or truncated Flag loci across derived RTA lineages. This is a plausible and interesting scenario, but it should be framed more consistently as a testable hypothesis rather than as an established evolutionary pathway.

    1. eLife Assessment

      This important study investigated whether the nuclear receptor Nur77 is regulated by a non-canonical mechanism of ligand-induced disruption of its interaction with RXRg, similar to the family member Nurr1. The overall evidence is compelling. This manuscript will be of interest to scientists focusing on mechanisms of transcriptional regulation.

    2. Reviewer #1 (Public review):

      Summary:

      This foundational study builds on prior work from this group to reveal the complexities underlying ligand-dependent RXRγ-Nur77 heterodimer formation, offering a compelling re-evaluation of their earlier conclusions. The Authors examine how a library of RXR ligands influences the biophysical, structural, and functional properties of Nur77. They find that although the Nur77-RXRγ heterodimer shares notable functional similarities with the Nurr1-RXRα complex, it also exhibits unique features - notably, both dimer dissociation and classical agonist-driven activities. This work advances our understanding of the nuanced behaviors of nuclear receptor heterodimers, which have important implications for health and disease.

      Strengths:

      (1) Builds on previous work by providing a comprehensive analysis that examines whether Nur77-RXRγ heterodimer formation parallels that of the Nurr1-RXRα complex.

      (2) Systematic evaluation of a library of RXR ligands provides a broad survey of functional outputs.

      (3) Careful reanalysis of previous work sheds new light on how NR4A heterodimers function.

      Weaknesses:

      (1) Some conclusions appear overstated or are not well substantiated by the work presented. It's unclear how the data support a non-classical mode of agonism, for example, based on the data shown.

      (2) Some assays have relatively few replicates, with only two in some cases.

      Comments on revisions:

      I'm satisfied with the revised version.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores the mechanisms by which binding of the nuclear receptor RXRg regulates its heterodimeric partner Nur77. Previously, this group made the interesting discovery that ligand-dependent activation of RXRg bound to a related partner, Nurr1, does not occur through a classical pharmacological mechanism but through agonist-dependent dissociation of the complex through disruption of their ligand binding domain (LBD) interactions. Here, they revisit this paradigm with Nur77. In contrast to Nurr1, the authors do not have the reagents to clearly support a role for LBD dissociation. Following from the model of partial ligand-dependent dissociation of the LBD heterodimer, the experimental data (NMR, ITC, SEC) are interesting and quite complex.

      Strengths:

      The authors do a rigorous job of describing the data and providing possible interpretations and caveats. Revisiting the analysis of Nurr1, they identify the crucial role that selective Nurr1-RXRg agonists played in supporting the LBD dissociation model; without analogous compounds for the Nur77-RXRg complex, it is difficult to invoke this mechanism. Interestingly, treatment with the Nurr1-RXRg selective agonist HX600 suggests it can induce some LBD dissociation. Therefore, there may be some similarities between regulation of Nurr1 and Nur77 by RXRg.

      Weaknesses:

      Despite evidence supporting a partial role for RXRg LBD dissociation as a mechanism to activate Nur77, other data demonstrate that a fundamentally different regulatory mechanism likely exists in the Nur77-RXRg complex that involves the RXRg disordered NTD. The decision to describe further study of this as outside the scope of this work is unfortunate, as it closed off an avenue that could have provided fruitful data informing the apparently distinct regulatory mechanisms of the Nur77-RXRg complex. Given the uncertainty in the importance of the partial roles of the pharmacological mechanism, LBD dissociation, and the RXRg NTD, this study may have limited impact on the field.

      Comments on revisions:

      I'm satisfied with the revision.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This foundational study builds on prior work from this group to reveal the complexities underlying ligand-dependent RXRγ-Nur77 heterodimer formation, offering a compelling re-evaluation of their earlier conclusions. The authors examine how a library of RXR ligands influences the biophysical, structural, and functional properties of Nur77. They find that although the Nur77-RXRγ heterodimer shares notable functional similarities with the Nurr1-RXRα complex, it also exhibits unique features, notably, both dimer dissociation and classical agonist-driven activities. This work advances our understanding of the nuanced behaviors of nuclear receptor heterodimers, which have important implications for health and disease.

      Strengths:

      (1) Builds on previous work by providing a comprehensive analysis that examines whether Nur77-RXRγ heterodimer formation parallels that of the Nurr1-RXRα complex.

      (2) Systematic evaluation of a library of RXR ligands provides a broad survey of functional outputs.

      (3) Careful reanalysis of previous work sheds new light on how NR4A heterodimers function.

      We thank the reviewer for recognizing our work as foundational. In the nuclear receptor field, current understanding of ligand-regulated nuclear receptor activity is based largely on ligand-dependent coregulator recruitment preferences; for example, agonists enhance coactivator recruitment to activate transcription. Building on our recent study of Nurr1-RXRα, the present work suggests that activation of the evolutionarily related NR4A-RXR heterodimer Nur77-RXRγ by RXR ligands is also consistent with a non-classical activation mechanism involving heterodimer dissociation.

      Weaknesses:

      (1) Some conclusions appear overstated or are not well substantiated by the work presented. It's unclear how the data support a non-classical mode of agonism, for example, based on the data shown.

      We thank the reviewer for this important point. We did not intend to claim that Nur77-RXRγ activation is explained exclusively by a non-classical mode of agonism. Rather, our interpretation was that the data are consistent with two possible, non-mutually exclusive mechanisms: (1) a classical pharmacological mechanism involving ligand-dependent coregulator recruitment; and (2) a non-classical mechanism involving ligand-binding domain (LBD) heterodimer dissociation, as we previously described for Nurr1-RXRα. This differs from our prior eLife study of Nurr1-RXRα, in which the data supported the LBD heterodimer dissociation model but not the classical pharmacological model.

      In our revised manuscript, we clarify two points that are important for interpreting the Nur77-RXRγ data. First, several experimental limitations of the Nur77-RXRγ studies reduced the extent to which the mechanism could be resolved as rigorously as in our earlier Nurr1-RXRα study. Second, and more importantly, the currently available ligand set lacks Nur77-RXRγ-selective agonists. This limits our ability to determine whether LBD heterodimer dissociation is the sole or principal mechanism of activation, or instead one of several contributing mechanisms.

      Taken together, these results support LBD heterodimer dissociation as a plausible and experimentally observable component of Nur77-RXRγ activation and, therefore, as a candidate shared activation mechanism for NR4A-RXR heterodimers. At the same time, because the quantitative evidence is less definitive than in the Nurr1-RXRα system, we agree that conclusions regarding Nur77-RXRγ should be stated more cautiously. This caution is reflected in both the title of our manuscript (“Towards a unified mechanism…”) and the language used throughout the text.

      (2) Some assays have relatively few replicates, with only two in some cases.

      We thank the reviewer for their attention to experimental rigor. For some assays, the findings were reproduced in two independent experiments, which we considered sufficient to confirm the presence and reproducibility of the effects observed in those particular assay formats. In the original manuscript, we used a general statement in the figure legends (“representative of two or more independent experiments”) across all assay data. In the revised manuscript, we now specify the number of independent experimental replicates for each assay in the corresponding figure legends to improve transparency.

      Reviewer #2 (Public review):

      Summary:

      This study explores the mechanisms by which binding of the nuclear receptor RXRg regulates its heterodimeric partner Nur77. Previously, this group made the interesting discovery that ligand-dependent activation of RXRg bound to a related partner, Nurr1, does not occur through a classical pharmacological mechanism but through agonist-dependent dissociation of the complex through disruption of their ligand binding domain (LBD) interactions. Here, they revisit this paradigm with Nur77. In contrast to Nurr1, the authors do not have the reagents to clearly support a role for LBD dissociation. Following the model of partial ligand-dependent dissociation of the LBD heterodimer, the experimental data (NMR, ITC, SEC) are interesting and quite complex.

      Strengths:

      The authors do a rigorous job of describing the data and providing possible interpretations and caveats. Revisiting the analysis of Nurr1, they identify the crucial role that selective Nurr1-RXRg agonists played in supporting the LBD dissociation model; without analogous compounds for the Nur77-RXRg complex, it is difficult to invoke this mechanism. Interestingly, treatment with the Nurr1-RXRg selective agonist HX600 suggests it can induce some LBD dissociation. Therefore, there may be some similarities between the regulation of Nurr1 and Nur77 by RXRg.

      We thank the reviewer for this thoughtful and balanced summary of our work. We appreciate the reviewer’s recognition of both our prior findings in the Nurr1-RXRα system and the interesting, but more complex, experimental behavior observed here for Nur77-RXRγ. We agree that the absence of Nur77-RXRγ-selective agonists currently limits how definitively the contribution of LBD dissociation can be resolved, and we have revised the manuscript to make this point more explicit and to further temper our conclusions accordingly.

      Weaknesses:

      Despite evidence supporting a partial role for RXRg LBD dissociation as a mechanism to activate Nur77, other data demonstrate that a fundamentally different regulatory mechanism likely exists in the Nur77-RXRg complex that involves the RXRg disordered NTD. The decision to describe further study of this as outside the scope of this work is unfortunate, as it closed off an avenue that could have provided fruitful data informing the apparently distinct regulatory mechanisms of the Nur77-RXRg complex. Given the uncertainty in the importance of the partial roles of the pharmacological mechanism, LBD dissociation, and the RXRg NTD, this study may have limited impact on the field.

      We thank the reviewer for this thoughtful point. We agree that the RXRγ NTD likely contributes to regulation of Nur77-RXRγ transcription, and that our truncation data suggest that regions outside the LBD can influence transcriptional output. At present, however, the effect of RXRγ NTD truncation is not sufficiently mechanistically resolved to distinguish among several plausible explanations.

      For example, the RXRγ NTD has been implicated in phase separation and biomolecular condensate formation in cells (PubMed ID 40392852, 40420113, 33971237, 31881311), and perturbing these properties (via RXRγ NTD truncation) could indirectly affect Nur77-RXRγ transcriptional activity. In addition, NTDs of nuclear receptors can participate in coactivator or corepressor interactions (PubMed ID 24284822), raising the possibility that removal of the RXRγ NTD alters transcription by changing recruitment of regulatory factors rather than by directly informing the LBD-centered mechanism examined here. We will clarify in the revised manuscript that these possibilities remain unresolved and represent important directions for future study.

      We also agree that defining how multiple RXRγ domains contribute to Nur77-RXRγ regulation would be valuable for the field. However, the focus of the present study is narrower: to test whether, as in our previous eLife study of Nurr1-RXRα, RXR ligands can influence heterodimer function through effects on LBD-LBD interactions. Because the available data do not yet allow a mechanistic dissection of the RXRγ NTD contribution, we believe that a definitive analysis of this question would require a separate set of experiments beyond the scope of the present work. We have revised the manuscript to better acknowledge this limitation and to frame the conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, this is a compelling body of work. Additional summary statements and clearer transitions would be helpful throughout.

      Here are some points that should be addressed or at least discussed by the authors:

      (1) It is unclear in the luciferase assays whether the truncated proteins are functional or not. Were there Western blots or other assays run to confirm protein concentrations?

      We thank the reviewer for this point. We did not perform Western blotting or other assays to confirm equivalent expression levels of the truncated RXRγ constructs, and we agree that this is a limitation of the luciferase assay data. As a result, the transcriptional effects observed with the truncation constructs should be interpreted cautiously.

      With that said, the increased transcriptional activity observed upon deletion of the RXRγ NTD/AF-1 region suggests that this region may exert a repressive effect on Nur77-RXRγ transcription. This effect could reflect multiple, non-mutually exclusive mechanisms, including altered phase separation or condensate-related properties of RXRγ, or altered recruitment of transcriptional coregulators through the NTD. Because our truncation strategy does not distinguish among these possibilities, we do not believe these data allow a definitive mechanistic interpretation of the NTD contribution.

      We have revised the manuscript to clarify this limitation. We also note that the primary focus of the present study is the role of ligands in modulating Nur77-RXRγ function through LBD-mediated interactions, in direct comparison with our previous Nurr1-RXRα study. A more complete mechanistic dissection of how RXRγ domain architecture influences Nur77-RXRγ transcription will require future work.

      (2) Why does the Nur77 construct lacking the NTD show increased luciferase activity?

      Please see our response above to Reviewer 2’s Public Review, which also addresses this point.

      (3) A case is made for the Nur77 LBD driving the activity, but it also could be inferred that the DBD is driving based on the data shown in Figure 1.

      We thank the reviewer for this point. We agree that the Nur77 DBD is required for binding to NBRE response elements, and we did not intend to suggest otherwise. The experimental approach in Figure 1 was not designed to dissect the relative contributions of Nur77 domains, since Nur77 was tested only in its full-length form. Instead, the purpose of this experiment was to examine how truncation of RXRγ domains affects Nur77-RXRγ transcriptional activity, in direct comparison with our prior eLife study of Nurr1-RXRα, where RXRα domain truncations helped define the importance of RXR-LBD-mediated regulation. We will revise the text to clarify that Figure 1 does not distinguish whether Nur77 DBD-dependent DNA binding is necessary, but instead addresses whether the pattern of RXRγ domain dependence is consistent with an LBD-centered mechanism of ligand-regulated heterodimer function.

      (4) It is stated that the HX600 coactivator recruitment requires further study. Why wasn't it studied here?

      We thank the reviewer for this point. The primary focus of this study was to determine how RXR ligands influence Nur77-RXRγ heterodimer activity, particularly in relation to ligand-dependent effects on heterodimer function. A more detailed analysis of HX600-dependent coactivator recruitment would require a broader mechanistic investigation of RXRα and RXRγ homodimer pharmacology and RXR-specific coregulator interactions, which extends beyond the central scope of the present manuscript. We agree that this is an important question and view it as a valuable direction for future work.

      (5) Figure 3B, the shifts in monomer populations, error bars aren't shown, the biggest shift is from 0.2 to 0.6, is that statistically meaningful?

      We thank the reviewer for this point. The reviewer is correct that error bars were not shown for Figure 3B. These NMR measurements were performed once (n=1), and therefore the shifts in monomer populations shown in Figure 3B cannot be assessed statistically. Because these studies required substantial NMR instrument time and isotopically labeled protein at high concentration, we were not able to perform experimental replicates for this dataset. We have revised the figure legend to explicitly state that these data were collected from a single experiment and have tempered the corresponding language in the manuscript accordingly.

      (6) Some ligands are shown in the figures but don't appear to be discussed in the text (at least that I can find), such as SR11237.

      We thank the reviewer for pointing this out. We used a panel of 14 commercially available RXR ligands with different pharmacological properties to probe Nur77-RXRγ function, as in our previous Nurr1-RXRα study. In the text, we emphasized ligands that were most informative for the mechanistic conclusions, rather than discussing every compound individually. SR11237, for example, behaved similarly to the broader group of RXR agonists and was therefore shown as part of the full ligand panel but not specifically highlighted in the text. We will clarify this in the revised manuscript.

      (7) There is a sentence in the discussion that says "these observations implicate that although RXRg LBD provides the protein-protein interaction interface to bind Nur77...." the authors did not show enough data to support this claim. It should be bolstered.

      We thank the reviewer for this point. We agree that this statement was stronger than was warranted by the data presented. Our intent was not to claim that the present study definitively establishes the RXRγ LBD as the sole or fully defined protein-protein interaction interface for Nur77 binding. Rather, based on the domain truncation data together with our prior Nurr1-RXRα study, we intended this statement as a working interpretation consistent with an LBD-centered mechanism. In our revised manuscript, we have softened this language to avoid overstating the conclusion and clarified that the current data support, but do not definitively prove, a role for the RXRγ LBD in mediating functionally relevant interaction with Nur77.

      Reviewer #2 (Recommendations for the authors):

      Even though this study is not able to make definitive claims about the mechanism(s) of activation of Nur77 in the Nur77-RXRg complex, the work presented here is rigorous and solidly interpreted. Identifying differences between Nurr1 and Nur77 regulation is important, and the work here shows that selective agonists are essential for supporting the non-canonical mechanism they identified before. Although they address potential implications of NTD regulation in the discussion, it feels like a lot of insight into Nur77 regulation is being missed. However, it is clear that addressing this experimentally would require substantially more work. I don't have any specific recommendations. Given current limitations on funding, I think it's fine to focus on the work completed with the acceptance that it likely limits the impact of the work on the field.

      We thank the reviewer for this thoughtful and balanced assessment of our work. The goal of this manuscript was to test whether the LBD heterodimer dissociation mechanism that we previously reported for Nurr1-RXRα may represent a conserved feature of NR4A-RXR heterodimers by extending these studies to Nur77-RXRγ. We agree that understanding the role of the RXRγ NTD in Nur77-RXRγ regulation is important and potentially highly informative. At the same time, resolving that question experimentally would require a distinct and more extensive set of studies beyond the scope of the present work. We have therefore chosen to focus this manuscript on the completed LBD-centered studies, while acknowledging that this narrower scope may limit the broader impact of the work.

      Minor points:

      (1) Without page and line numbers, it is not easy to point out specific text. On the bottom of page 6 of the document, there are two references to Figure 3a, and the arrows that help illustrate RXRg LBD-dependent CSPs; the second figure callout should describe the blue arrow, I believe.

      Thank you, we made this change.

      (2) Bottom of page 8, "...revealed two compounds [that] standout..."

      Thank you, we made this change.

    1. eLife Assessment

      This study investigates trial-by-trial intra- and inter-cortical interactions in the visual cortex of the mouse and the monkey. The authors find that activity in one layer (in mice) or one area (in monkeys) can partially predict neural activity in another layer or area on the single-trial level in different experimental contexts. This valuable finding expands previously known contributions of stimulus-independent downstream activity to neural responses in the visual cortex by demonstrating how these change under varying visual stimuli as well as in the absence of visual stimulation. While the methodology is solid, the juxtaposition of mouse and monkey data from different modalities and at difference scales limits the interpretability of the observations and forces superficial comparisons. More in-depth focus on either data set in isolation may reveal more nuanced understanding of cortical interactions rather than trying to draw parallels between very different datasets.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors evaluated inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 3 monkeys. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) use of existing data

      (2) addresses an interesting question

      Weaknesses:

      The data and analysis results are presented in a way that invites direct comparison between mouse L4<->L2/3 variance explained numbers, and monkey V1<->V4 variance explained numbers. This comparison is highly problematic and can't be taken at face value as the authors themselves clearly acknowledge in the Discussion and reply to the reviews. The datasets simply differ in too many aspects. If the goal of the authors is not to compare, then the analyses should be presented separately, allowing for a more detailed analysis of each (also see below).

      Understanding which patterns in the data are robust and which are idiosyncratic to individual animals/recordings is complicated by the fact that some figures appear to show a single mouse and some averages over all four mice with no indication over whether the results are consistent across mice. For the monkey results, all figures in the main text appear to only show a single monkey, with the other two monkey results in the SI. Again, it is not clearly presented and discussed which aspects of the results are robust, and which differ between monkeys.

      Furthermore, there are literally dozens of statistical comparisons between various conditions and metrics in the main figures without them being sufficiently organized around robust new insights, that will likely replicate, and that can inform our understanding of the underlying processes, or constrain computational models.

    3. Reviewer #2 (Public review):

      Summary:

      In this work the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.

      To test this, the authors analyzed previously collected and publicly available datasets and data recorded themselves. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas and under different conditions of sensory stimulation and behavioral state. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity and LFPs (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimates the variability of individual neurons which may vary widely in their participation in shared sources of variance.

      From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison and with differences in the quality of recordings between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli are not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent from the sensory signals being studied.

      Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities between species and scales (within vs. across cortical areas) limit the interpretability of the inter-species comparisons, and while this is not the stated goal of the authors, the juxtaposition of these two datasets invites comparison.

    4. Reviewer #3 (Public review):

      Neural activity in visual cortex has primarily been studied in terms of responses to external visual stimuli. While the variability of neural inputs to a visual area are known to also influence visual responses, the contribution of this stimulus independent component to overall visual responses has not been well characterized.

      In this study, the authors analyze datasets from both mice (a previous V1 Ca++ imaging study) and monkeys (data from a previous study and new large-scale electrophysiological recordings from V1-V4). Using regression models, they examine the predictability of neural activity between Layer 4 and Layer 2/3 in mice and between V1 and V4 in monkeys. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of stimulus independent downstream activity on neural responses. These findings can inform future modeling work of neural responses in visual cortex to account for such non-visual influences.

      The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. While many of the predictability pattens are largely in line with expectations (eg., downstream layers/areas predicting upstream activity), it is valuable to have these relationships quantified as the authors have done here. Predictability also depended on stimulus type, but these dependencies were not consistent across animals, making it difficult to draw general conclusions. Finally, they show robust predictions even during spontaneous activity which are only partially accounted for by available behavioral metrics. Together, these analyses provide a valuable quantification of stimulus-independent components of visual cortical activity and their potential role in shaping sensory responses.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We truly appreciate all the effort that the reviewer put into reading and understanding our work. With a total of 37 excellent questions, this is one of the most thorough reviews that we have received in a long time.

      R1.0: Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      R1.1: Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      First, a few points of clarification.

      (i) We worked with two-photon calcium imaging data (mice), and with the envelope of multi-unit activity (monkeys). While both of these types of signals are strongly correlated with spikes, neither of them can be truly considered to be a point process.

      (ii)The reviewer points to Figure 2AB. The signals that we worked with can be negative. The black traces are the actual signals and show clear negative bouts, especially noticeable in the middle panel in Figure 2B. Of course, this does not mean that there are negative spike rates. This has to do with the way the data are normalized and not with the specific prediction method. However, the reviewer is correct in stating that the method that we used could also yield negative values even for non-negative spike rates.

      (iii) We did not bin the macaque data into 200-ms time bins, but rather 25-ms time bins (line 548, Figure 1B legend). Additionally, we have now performed additional analyses with different window sizes, showing that the conclusions still hold (see Supplemental Figure 4 and lines 139-143).

      To further address the reviewer’s question, we implemented a Poisson GLM enforcing non-negativity on macaque MUAe data (without spontaneous activity subtraction, ensuring strictly positive values; lines 135-139, Supplemental Figure 1M). The model did not improve predictions over ridge regression, confirming our methodological choice. This method is not directly applicable to mouse calcium data, since the activity after baseline subtraction can be negative.

      We did not use Granger or any other causality methods. The question of causality is certainly important, and there are multiple methods developed to assess causality in neural signals. We do not make any claims about causality in our study. A rigorous evaluation of causality is an interesting line of research for future work.

      R1.2: In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We have now added data from 2 additional monkeys, including:

      (i) A second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights off condition (lines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental figures 1-6, 8, 11, 12, and 13; Table 2).

      (i) We collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Ponce lab (lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental figures 1-2, 4, 6, 9, 11, and 12; Table 2). The new data include responses to the same checkerboard and gray screen images as the original dataset, along with responses during lights-off conditions.

      R1.3: One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test, applied to neurons with EV>0.4. The test confirmed significant bimodality in two of the three mice (MP031 and MP032: p<0.001; MP033: p=0.687). These results are now included in the Results section (lines 307-311) and shown in Supplemental Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings), the same test yielded non-significant results (e.g., p=0.994), confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.4: Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We also agree with this comment. Our goal is not to provide any direct quantitative comparison between the two species. We emphasize (lines 494-497) that the experiments in the two species differ along multiple dimensions, including: (i) differences in recording modalities (calcium vs. electrophysiology), (ii) associated differences in temporal resolution, neuronal types, and SNR, (iii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we also emphasized that the aim of this work is to investigate inter-areal interactions within each species rather than to draw quantitative comparisons between species (lines 497-499).

      Reviewer #1 (Recommendations for the authors):

      R1.5 In the analysis of directionality, you stated that subsampling was done randomly. Presumably, there could be multiple subsamples that fulfill the control of split-trial r. Are you only showing results from one subsample or multiple subsamples?

      We show the median from 10 subsample permutations. This is now clarified in line 621.

      R1.6 About the measurement 1-vs-rest r2. Understanding the definition is important for interpreting the results, but the definition was not clearly written. In lines 195-196, could you be more clear about whether the correlation is between the predicted neuron and other neurons in the predicted population or between the predicted neuron and the mean activity of the predictor population? Also, in line 212, why do you call this self-consistency? Isn't this a correlation between a neuron and the others?

      The 1-vs-rest r<sup>2</sup> value, or self-consistency, is the correlation calculated for each neuron i and does not involve other neurons. Let indicate the response 𝑟 of neuron i during trial t (t=1,..., T where T is the total number of trials). For a given trial t, we compute the average activity of the neuron excluding this trial:

      Throughout, the superscript (rest)means “all repetitions excluding repeat 𝑡”. The one-vs-rest correlation for the held-out repetition 𝑡 is:

      We then average these correlations across all held-out repetitions:

      We now clarify this in the text (lines 304-306 and lines 642-647).

      R1.7 In Figure 6 G and I. The "all" condition contains more neurons than either of the other two. In this case, is this comparison fair or meaningful?

      The reviewer is also correct here. The comparisons between the <10% and >80% groups contain the same number of predictor neurons, and those are fair comparisons. The “all” condition contains more predictor neurons, and, therefore, those comparisons are not fair. We clarified this point in lines 360-364.

      We included the “all” condition here because we think that it is an instructive sanity check in terms of reporting how EV changes with more neurons, and also in terms of understanding why the EV values in the other two conditions are lower. Expanding on this point with a little bit of philosophy, ultimately, when considering a neuron in area B (e.g., V4) and the contributions from neurons in another area A (e.g., V1), one would like to have access to all the inputs (e.g., all the neurons in V1 that are monosynaptically connected to the target neuron in area V4). We do not have access to this type of information, and we do not make any claims about monosynaptic connectivity, let alone exhaustive sampling of inputs to a given neuron. The “all” condition merely provides a quantitative illustration of the fact that EV increases with the number of predictor neurons. This observation may be considered to be somewhat trivial, but it should be pointed out that the conclusion relies on the input neurons sharing information with the target neurons (e.g., perhaps one may not be able to predict V4 activity very well from the responses of millions of neurons in the cerebellum).

      R1.8 I believe the results section can be improved by adding some interpretation after each finding.

      We thank the reviewer for the suggestion. We generally like to separate results from interpretation. However, to honor the suggestion, we added brief interpretations throughout the results section (lines 142-143, 171-173, 272-273, 279-281, 331-333, and 361-364) and expanded on the interpretations in the Discussion section.

      R1.9 Line 52 - 74: It would be better to be more specific about what kind of neuronal interactions, e.g., noise correlation, synchrony, etc.

      We added a clarification on the types of interactions we study in lines 68-73.

      R1.10 Line 81. Something seems to be missing after "5500". 5500 trials? Neurons?

      We thank the reviewer for pointing this out. The number refers to neurons (fixed in line 87).

      R1.11 Line 94. The readers would appreciate more explanation of the method.

      We have expanded on the explanation, as suggested (lines 106-107).

      R1.12 Line 104. The fraction of visually responsive neurons seems to be small. Is this typically for mouse V1? Would this fraction be higher if you also used the peak, as you did for macaque data in your SNR calculation (line 412)? And what is this number for the recorded L4?

      The reviewer correctly points out the small number of visually responsive neurons.

      We note that we now refer to the subset of neurons used for prediction analyses as visually reliable (VR) neurons (lines 115-116, 125-126, 178-179, 183-184, 211-212, 214-216, 217-226, 283-286), defined conservatively as neurons with SNR > 2 computed from the mean across all stimuli (not the peak to any one stimulus) and split-half reliability >0.8 (Methods, lines 569–590). This choice emphasizes neurons that are consistently informative over the full stimulus set.

      Regarding the question of how typical the number of responsive neurons in mice is, the fraction of “responsive” neurons in mouse V1 varies widely depending on the definition and stimulus set but the fractions are substantially lower than those reported in monkeys (with different methods). For those of us more used to the macaque neurophysiology literature, this has been one of the biggest surprises coming from work in rodents. Many studies report a sizable group of non-responsive neurons in mouse V1 (e.g., as little as 37% percent of V1 neurons being responsive in at least 25% of the trials according to de Vries et al., Nat Neur, 2020). Our fraction of visually responsive neurons is small because it couples a conservative SNR metric with a high trial-reliability threshold.

      As the reviewer notes, a peak-based metric based on any stimulus would be a less conservative criterion that would increase the fraction of neurons labeled responsive.

      R1.13 Line 113. Why not also give an exact percentage number?

      We have given the exact percentage number (lines 125-126).

      R1.14 Line 128. Is this just because L2/3 has more neurons? If so, then isn't this trivial?

      Our intention was to illustrate the best prediction performance we could get in either direction, which means including all L2/3 neurons. We have reworded our text to clarify (lines 149-151).

      R1.15 Line 134. Isn't this expected? Since V1 have more units than V4?

      The reviewer is correct. As discussed in R1.7 in mice, we sought to report the best prediction performances in either direction. We have edited our text for clarity (lines 149-151).

      R1.16 Line 165-168. What's the logical connection between these two sentences? If the former is true, we should expect to see differences. Also, why the same population? Shouldn't you include non-visual neurons?

      The two sentences in question are: “The difference in predictability in the absence of a stimulus could in principle change according to the directionality in inter-laminar interactions.” and, “There was no statistically significant difference in the EV fraction between laminar directions (L4→L2/3 vs. L2/3→L4) using the same control population as in Figure 3B (Figure 5A-C and Figure Supplement 2H).”. The key point here was to control for similar reliability values in order to make fair comparisons. We have added an additional comparison between directionalities focusing on nonvisual neurons (SNR<2 & r<0.8), and have also found no statistically significant difference between direction of predictability (Supplemental Figure 3A, right, lines 221-224).

      R1.17 Table 2. The information of which session corresponds to which experiment can be put in the table, which would be easier to read.

      We have added which sessions correspond to which experiments in Table 2.

      R1.18 Figure 1, Captions for panel c and d. I don't see any colored arrows in the figure.

      We removed the color descriptions (Figure 1C-D).

      R1.19 Figures 3, 4, and others. The annotations of "n.s." are very hard to see.

      We changed the color so that it is easier to see now (Figures 3, 4, 6, and Supplementary Figures 1-4, 6, and 8-10).

      R1.20 Figure 5, panel A. The legend is too small.

      We increased the legend size (Figure 5A).

      R1.21 Figure S5, panel D. Why are some of the data points connected?

      The paired connections are illustrated specifically in the highly predictable neurons to highlight the two separate distributions of neurons. One group, the highly predictable and highly reliable group, maintains its inter-laminar predictability after projecting out the “non-visual” activity (lines 327-330), whereas the highly predictable yet unreliable group shows a sharp decrease in inter-areal predictability, which corroborates the idea of non-visual components influencing neurons in mouse V1, as shown by Stringer et al. 2019b and consistent with our results.

      R1.22 l.91 "Ope" -> open?

      We fixed the typo (line 100).

      R1.23 Fig. 3C+D: Why is only one session used for this?

      One session was used to illustrate the distribution of split-half reliability values per area. Figure 3D contains information about all 5 stimulus sessions (see legend to Figure 3D).

      R1.24 "Even without controlling for the number of predictors or their respective split-half correlation values (627-688 sites in V1, 86-115 sites in V4), we found better predictability in the V1 to V4 direction than the reverse ( 𝑝 < 0.001, Figure Supplement 2I)." -> What does "even" mean here? Isn't this simply the null result if there is no true difference and the real reason the authors controlled for size?

      The reviewer’s understanding is correct. We have edited our text for clarity (lines 157-160)

      R1.25 "We could predict V1 and V4 activity across all stimulus types ( 𝑝 < 0.001, paired permutation test of prediction vs. shuffled frames prediction)." -> better than chance? For all neurons on average? What does this mean? Isn't it trivial and 100% expected that neural activity in the visual cortex is above chance related to the visual input?

      We stated that sites in V1 and V4 could predict each other across all stimulus types before describing the differences between them. We agree that this observation is to be expected and indicated so now in the text (lines 185-186).

      R1.26 "The predictability was the highest in both directions for neuronal activity in response to a full field checkerboard images (Figure 4D). In the V1 → V4 direction, the EV fraction was higher when predicting a slow-moving small thin bar compared to a fast-moving large thick bar (Figure 4D, left), whereas the opposite was true for the V4 → V1 direction (Figure 4D, right)." -> What does this mean? Is this expected or not? Under what theories of cortical processing?

      The differences between EV prediction directions (V1→V4: slow thin bars > fast thick bars; V4→V1: fast thick bars > slow thin bars) could be because V4 responses are more reliable for the slow thin bars whereas V1 responses are more reliable for the fast thick bars (Supplemental Figure 5H–I). To account for this possibility, we controlled for differences in target-related properties by regressing out covariates like SNR, split-half correlation, and variance. In monkey L, regressing out reliability/drive within direction using these covariates, the V4→V1 bar difference between slow thin bars and fast thick bars was not significant and the difference in the V1→V4 difference direction was reduced (Supplemental Figure 5K, lines 198-203). This suggests that the asymmetry primarily reflects stimulus‑dependent reliability of the target population rather than a strong directional selectivity.

      To the best of our knowledge, there are no clear predictions that match these observations from existing theories of visual cortical processing, especially given the paucity of computational models that include stimulus velocity when describing the responses in area V4. There has been extensive work on theories of surround suppression, but it seems unlikely that the thick bars would elicit surround suppression given the size of the V4 receptive fields. Many current computational models that aim to fit the responses of neurons in the visual cortex use neural networks that take an image as visual input and yield activations. Most of these models do not incorporate stimulus movement, and even those that do incorporate stimulus dynamics, only indirectly map onto interlaminar stimulus transformations or even between-area stimulus transformations. We hope that the results in this manuscript will help inspire and constrain better models of visual cortical processing.

      R1.27 Shouldn't all the predictability analysis be done conditioned on the stimulus in order to tell us more than the trivial "both V1 and V3, or L2/3 and L4, are driven by visual inputs"? (The spontaneous activity analyses are essentially that, for a small subset of the stimuli.)

      The key goal of this study is to quantify inter-areal interactions both under visual input and without visual input. This type of analysis is important because inter-areal interactions may depend both on visual inputs but also on neuronal inputs that are not triggered by visual signals. For example, extensive work in mice has now shown that neuronal responses in V1 depend on an animal’s running speed, independently of any visual input. Even within the visual input conditions, we present analyses where we shuffle trial order (e.g., Figure 7, Supplementary Figure 11) to estimate the contribution of trial-by-trial variations that are independent of visual inputs and other analyses where we project out non-visual activity (e.g., Supplementary Figure 7).

      R1.28 "In visually responsive neurons, there was a significant reduction in EV during gray screen compared to visual stimulus presentation" -> perfectly expected. But the report-worthy result here is how much is left, not whether EV is decreased!

      We have changed the wording on the results to highlight the sustained predictability (lines 211-212). It is important to note that, although the reduction in EV during gray screen may be expected, this observation does not hold for all neurons. In fact, there are some neurons for which the EV during visual presentation is comparable to that during gray screen (Figure 5B,C,E: neurons that lie on the diagonal line).

      R1.29 "Similar to the conclusions drawn from the mouse data, the predictability of neuronal activity was higher in response to stimulus presentation than to gray screen presentations" -> Really? Conditioned on stimulus, or explainable by the well-known fact that both V1 and V4 are visually driven?

      As discussed in R1.28, in mice, there are many neurons where the EV during gray screen is comparable to that during stimulus presentation. In monkeys, most sites were visually driven. As the reviewer points out, we expected that EV during stimulus presentation would be higher than during gray screen; this observation is a reasonable sanity check. The difference between unshuffled trials and shuffled trials (Figure 7, Supplementary Figure 11) provides an estimate of the interactions that are not purely explained by visual inputs alone in monkeys.

      R1.30 "Unlike the mouse, macaque correlation of visual predictability between stimulus presentation and spontaneous activity was high across all types of spontaneous conditions" -> Why? Is this simply explainable by a lower mean response in the spontaneous condition in the mouse? Are these mouse and monkey experiments truly comparable? Isn't it surprising that spontaneous activity in the monkey visual cortex compared to evoked activity is higher than in the mouse?

      With respect to the question of whether spontaneous activity (or stimulus-evoked activity) in monkeys is higher than in the mouse, it is difficult to make these comparisons. We emphasize in the text the multiple differences between the experiments in both species. Our goal is not to perform any quantitative comparison across species (see R1.4). We changed the wording to remove any inference of comparison between species (lines 248-250).

      R1.31 Occasionally imprecise presentation. Ex "To further examine the non-stimulus driven component, we reasoned that if the shared information between areas were strictly driven by the visual stimulus, then using the activity of a stimulus presentation repeat to one specific image could be used to predict the responses to any other stimulus repeat of the same image. On the other hand, if the shared activity does not have any stimulus-response information, then the prediction model would not work when considering responses across repeated presentations of identical stimuli in different trials. To test these two opposing ideas, we compared the inter-areal prediction EV fractions using unshuffled versus shuffled trials." -> Sets up two extreme strawmen (100% driven by stimulus vs 0% driven by stimulus). What does "model would not work" mean? EV=0? Hypotheses not ideas.

      Our intent was to set up two extreme hypotheses, not to claim that neurons must fall exclusively into one or the other. The two extremes help better interpret the results.

      The reviewer indicates that these are straw-man hypotheses. This may well be the case. But note the responses to R1.12, R1.27, R1.28, and R1.29. The reviewer seems to assume that all or most neurons in the visual cortex should be mostly or exclusively driven by visual stimuli.

      We also replaced “ideas” with “hypotheses”, as suggested. We have expanded the discussion of these points in the manuscript (lines 480-493). Many neurons occupy intermediate positions between these two extreme hypotheses. We clarified that “model would not work” refers to prediction accuracy approaching chance (EV ≈ 0).

      R1.32 "In both species and in both directions, inter-areal prediction EV fraction persisted (𝑝 < 0.001," Doesn't persist mean EV is unchanged? But the test is EV>0 or not in both cases.

      We meant that EV values remained significantly above chance, not that they were unchanged. The statistical test was indeed whether EV > 0 as the reviewer indicated. We have revised the text accordingly (lines 375-380).

      R1.33 "In mice, neurons showed a bimodal distribution in terms of their response predictability in shuffled and unshuffled trials" -> I don't see any bimodality in the figure, nor is there a statistical test provided for bimodality.

      In Figure 7C, a group of neurons lay essentially along the horizontal axis, whereas the other group is dispersed closer to the diagonal line. Specifically, the neurons that lay on the horizontal axis are also the ones whose responses are best predicted during gray screen activity. We have changed the text to clarify this point (lines 380-382).

      R1.34 "In the macaque V4 → V1 direction, there was a large proportion of neurons with peak EV when considering 25 ms to 50 ms offsets in the positive direction (i.e., V4 after V1, Figure 7I, right)." -> So what does this mean? Is this compatible with anything we know? This is the anti-causal direction so some kind of explanation would be warranted.

      In the V4→V1 panel, a positive offset means we use V4 at t+Δt to predict V1 at t (and conversely in the V1→V4 panel). Therefore, the fact that the peak EV occurs at +10–20 ms indicates that V1 leads V4 by ~10–20 ms: in other words, V1’s earlier response best predicts V4’s slightly later response. This observation is not anti-causal, but rather it is consistent with the canonical largely feed-forward V1→V4 latency (e.g., Schmolesky et al., 1998 among many others). We clarified this in text (lines 400-404).

      R1.35 L. 307: "In monkeys," plural!?

      While this was not correct in the original version, we have now added data from two more monkeys.

      R1.36 L. 313: "we observed an approximately bimodal distribution of neuronal responses, with a large subset of neurons that do not show reliable responses to visual stimuli both in L4 and L2/3" -> where?

      The bimodal distribution can be appreciated in Figure 6B (1-vs-rest r2, third panel, note neurons along the y-axis, see also R1.33) and Supplementary Figure 7B (lines 307-312). Additionally, as stated in R1.3, we have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test (lines 310-313); see also Supplementary Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings) the same test yielded non-significant results, confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.37 Random subsampling to control for population size done with how many subsamples? How are they combined? Variability across subsamples interpreted how?

      We performed 10 permutations and used the median distributions across permutations (line 621).

      Reviewer #2 (Public Review):

      R2.0: “Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.”

      R2.1: To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree with these points, and we discuss these specific limitations in capturing the variability of individual neurons in the Discussion section (lines 500-504). We have now also added analyses based on local field potentials (LFP). LFPs do not directly reflect the activity of individual neurons either.

      R2.2: From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We agree that these datasets do not lend themselves well to directly separating and quantifying all the different sources of the predictive signals. We expand on this point in the Discussion section (lines 509-511).

      R2.3: The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We also agree with this comment. We emphasize that our goal is not to attempt a direct quantitative comparison across species (lines 497-499).

      R2.4: Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We have expanded on the Discussion section to explicitly state the points raised by the reviewer (lines 494-509).

      In mice, we have now also analyzed a separate dataset in which behavioral measurements were available, including running speed and facial motion (FaceMap SVDs). We used these to build behavioral-only and combined models to predict neural activity. We found that behavioral variables explained a modest but consistent portion of the variance across both spontaneous and stimulus conditions (Supplementary Figure 10A,C, lines 268-273).

      For the macaque data, we analyzed pupil size as the only available behavioral measure in the macaque dataset. We focused specifically on the “resting state, eyes open” condition, where both neural activity and pupil measurements were available. Using ridge regression, we assessed the extent to which pupil size predicted neural activity in V1 and V4. Pupil size alone explained only a small fraction of the variance (Supplementary Figure 10E, lines 274-276).

      R2.5: Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      Without going into specifics about the numbers, it is hard to fully address this question. As the reviewer noted in R2.1, the mouse data analyzed here do not lend themselves to evaluating predictability on scales of tens of milliseconds. In the macaque data, we have now conducted additional analyses where we binned the activity across a range of bin sizes (10 ms to 200 ms). The new analyses are shown in Supplementary Figure 4, and described in lines 140-143, 160-163.

      R2.6: The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7–13).

      Second, we note that our mouse preprocessing standardized responses by spontaneous mean and SD per neuron, controlling baseline scale across conditions (lines 535-538). Because of this standardization, spontaneous traces have unit scale (mean = 0, SD = 1).

      To test whether differences in variance underlie our findings, we calculated the variance for both species. For mice, we computed variance across repeats (visual) and across timepoints (lines 286-291). For the macaque moving-bar sessions, we computed variance across the concatenated held-out samples pooling timepoints, repeats, and bar identities (lines 291-292).

      The V4 population showed a higher overall variance distribution compared to the V1 population (Supplementary Figure 2I-J), and L2/3 variance was also overall higher than L4 (Supplementary Figure 2D-E). We also see a modest monotonic relationship between EV fraction and this variance (mouse visual: Spearman ρ = 0.43–0.52, p < 0.001; macaque stimulus responses: ρ = 0.50–0.56, p < 0.001; macaque gray-screen responses: ρ = 0.38, p < 0.001, Figure 6A,D), indicating variance contributes to (but is not the primary driver of) EV prediction fraction. We then adjusted for variance by fitting, within each stimulus condition, a linear regression of EV on variance (excluding shuffled-control rows) and conducted all comparisons on the resulting residual EV values, thereby isolating effects not attributable to variance (see Supplementary Figure 3E-G, lines 165-171).

      Reviewer #2 (Recommendations for the authors):

      R2.7 Overall I found this manuscript to be very clearly written and the results compelling, although I found myself wanting a little more. I believe these datasets also include information about eye movements, pupil diameter, and maybe locomotion and whisking in the rodent work. I think it could be informative to ask the degree to which the predictability, particularly during the spontaneous activity, is attributable to these other known sources of variance in trial-by-trial measures. My concern is that during visual stimulation, the space of cortical responses is limited to a very narrow scope (observing a visual stimulus during fixation) whereas spontaneous activity includes a broader range of possibilities (different states of arousal, eye movement).

      We analyzed the role of behavioral variables that could explain the neural activity in mouse V1 (including the variables suggested by the reviewer, running speed, facemap SVDs). The open dataset authors warned not to use pupil size since in the dark, the measurements were not accurate. In terms of the contribution to the predictability of mouse V1 activity, these behavioral variables showed a weak yet significant contribution (Supplementary Figure 10A,C, lines 260-270).

      R2.8 By controlling for eye movements or pupil diameter during spontaneous measurements, would you improve your measure of predictability?

      When predicting neural activity in the lights-off eyes open condition, combining neural data of the predictor population with information of pupil size did not result in a statistically significant increase in EV fraction when predicting the target population (Supplementary Figure 10E, lines 276-278).

      R2.9 Also, there is work that shows feed-forward correlations between V1 and higher visual areas are observed in higher frequency activity, whereas feedback is associated with lower frequency activity. If you compared your predictability measure over bandpasses with different timescales, would you find the direction of V1-V4 interactions changes consistent with this previous work?

      To address this question, we extended our analyses to the local field potential signals (LFPs) in monkeys, using band-limited LFP power (2–12, 12–30, 30–45, 55–95 Hz). We reran the lag sweep analyses (10-ms steps; 200-ms windows slid every 10 ms) in both directions. The Gamma band showed a feed-forward signature in the early evoked period: the V1→V4 predictability peaked at negative offsets (∼10–30ms; V1 leads), and the V4→V1 predictability peaked at positive offsets, consistent with previous findings. The results for low and beta frequency bands are also presented in the text (Supplemental Figure 13, lines 412-423).

      Reviewer #3 (Public review):

      R3.0: Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      R3.1: "A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4."

      We have now added the second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights-off condition. In addition, we collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Carlos Ponce lab (monkey A: seelines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental Figures 1-6, 8, 11, 12, and 13; monkey D: see lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental Figures 1-2, 4, 6, 9, 11, and 12. The conclusions for the new monkeys are qualitatively similar to the ones reported previously. The main quantitative differences are due to the very large difference in the number of predictor sites (Table 2, lines 127-134).

      R3.2: The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7-13).

      In Figure 4, differences in target-population responsiveness could influence predictability across stimulus types, as the reviewer points out. We therefore controlled for this by modeling EV as a function of the following neuron properties: split-half r, SNR, one-vs-rest r^2, and response variance. Regression was performed within each direction, where we then used residuals for inference_._ When comparing residuals, the predictability of checkerboard responses remained statistically higher than the predictability of the responses to moving bars (p<0.001, permutation test, Supplementary Figure 5K, lines 196-203), suggesting that the differences in predictability cannot be exclusively attributed to differences in the target population neuronal properties.

    1. eLife Assessment

      This valuable work identifies a subpopulation of neurons in the larval zebrafish pallium that responds differentially to varying threat levels, potentially mediating the categorization of negative valence. The evidence supporting these claims is solid; however, the study would be strengthened by more sophisticated analyses of functional imaging results, behavioral confirmation of stimulus valence, and further evidence linking the functionally distinct clusters to their molecular identity. This work will be of interest to systems neuroscientists investigating the circuit-level encoding of emotion and defensive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a map of neurons responding to aversive stimuli in zebrafish and suggests that the regions containing these neurons are homologous to mammalian brain areas involved in aversive processing. Specifically, this study found that neurons in a part of the pallium, the homolog of the amygdala, responded vigorously to strongly noxious and fully looming stimuli, but not to the milder cues. In contrast, neurons in another part of the pallium responded to all of these stimuli. The findings provide valuable insights into the neural mechanisms underlying negative-valence computation in zebrafish.

      Strengths:

      This study performed whole-brain functional imaging using two-photon light-sheet microscopy and identified the activity of individual neurons in awake zebrafish. This technique is highly valuable and will be broadly applicable to future studies aimed at elucidating the neural mechanisms underlying zebrafish behavior at single-neuron resolution.

      Weaknesses:

      Although this study reports neuronal responses to aversive stimuli, it did not directly assess how aversive these stimuli were for zebrafish. In general, studies of this kind quantify the aversiveness of test stimuli by measuring behavioral indices such as avoidance or escape responses. The present study states that "neurons responded vigorously to strongly noxious and fully looming stimuli, but not to milder cues." However, the authors did not provide behavioral evidence demonstrating that the stimuli were indeed aversive or that the so-called milder cues were perceived as less aversive by the animals. Without a behavioral measure of aversiveness, it is difficult to determine whether the reported neural responses reflect negative-valence processing, rather than general sensory salience or stimulus intensity.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to map neurons encoding negative valence at the whole-brain scale in larval zebrafish. Using two-photon light-sheet imaging combined with various aversive stimuli, they visualize and quantify stimulus-evoked neural responses, identify the anatomical locations of responsive neurons, and explore the possibility of genetically accessing Rl neurons that respond preferentially to strongly noxious stimuli.

      Strengths:

      The major strength of this study lies in its use of two-photon light-sheet imaging, which provides a system-level characterization of neuronal response to aversive stimuli. The authors systematically compare multiple classes of aversive stimuli (heat, electric shock, looming, etc.), showing that strongly threatening stimuli converge on a compact neuronal population in the Rl, supporting the robustness of the finding. Finally, the identification of Tiam2a expression in these neurons provides a potential genetic handle for future functional studies.

      Weaknesses:

      The main weakness of the study is the lack of causal evidence supporting the functional role of the identified neurons. Without optogenetic, chemogenetic, or ablation experiments, it is difficult to determine whether these neurons are required for or sufficient to encode negative valence. In addition, the study does not include positive-valence or neutral stimuli controls, making it difficult to distinguish whether the observed neural responses reflect valence per se or more general downstream response such as motor output. Finally, the lack of behavioral readouts limits the ability to directly link the identified neural populations to defensive behaviors.

    4. Reviewer #3 (Public review):

      Overview and Strengths:

      Accurate evaluation of threat levels allows animals to determine whether to escape. The precise mechanism underlying threat evaluation remains unclear. Smith et al. identified a cluster of neurons in the zebrafish rostrolateral dorsal pallium (Rl) that respond differentially to varying levels of negative-valence stimuli.

      This work leverages the small size and optical transparency of the larval zebrafish, using two-photon selective plane illumination microscopy to assay the response of pallial neurons to various negative-valence stimuli. Interestingly, unlike the ventromedial pallium and habenula, which responded to all stimuli tested, neurons in the Rl were activated by a selection of stimuli representing relatively higher levels of threats. By leveraging a zebrafish brain atlas, the authors identified a transgenic line labeling a tiam2a+ cluster of neurons that appears to be the activated population in the Rl. Together, these results demonstrate a subpopulation of pallial neurons that likely categorizes the strength of negative valence in larval zebrafish.

      The primary conclusions of this work are well supported by the data. The identification of a neuronal cluster that may underlie the categorization of threat-associated sensory stimuli is significant. Furthermore, this study generates a high-quality functional imaging dataset using cutting-edge microscopy, setting the foundation for understanding the neuronal encoding of emotions in zebrafish.

      Results from this work set the stage to answer further exciting questions: How do tiam2a+ Rl neurons modulate the activity of the hindbrain escape circuit? What is the functional role of the Rl population inhibited by threat stimuli? Computationally, how does Rl integrate sensory signals and classify threat levels? How does the activity of Rl change in the context of habituation and conditioning? Future work may use more nuanced stimuli and combine new genetic tools, behavioral recording, and circuit-level analysis to systematically reveal how emotions modulate defensive behaviors.

      Weaknesses:

      The impact of this work could be further enhanced by incorporating more sophisticated data analysis and by more clearly anchoring the findings within the known framework of zebrafish defensive behavior.

      (1) The authors performed statistical analyses across six ROIs per experiment in Figures 1E/J, 3E/J, and 6B/D/F. This increases the probability of Type I errors. Applying multiple comparison corrections would mitigate this concern. Given that most stimuli (except for the "IR heating") are non-directional, the authors may consider first testing for the response symmetry following each stimulus and then combining ROIs from the two hemispheres to calculate a single averaged measurement per region per fish for comparisons of regional dF/F.

      (2) I found the topographical mapping of activated and inhibited ROIs very informative. There appear to be two subpopulations of Rl: a posterior-medial population often activated by negative valence stimuli, and an anterior-lateral population that is frequently inhibited. I wonder if it is possible to decode the valence or category of a stimulus based on the topography and response profiles of these neurons? These results would provide additional evidence for the Rl's roles of threat evaluation.

      (3) Findings in this paper, especially differential responses of the Rl to full and partial looming, deserve an expanded discussion. The authors should better anchor these findings to established literature to emphasize their significance in the Discussion. For example, how might this potential categorization mechanism contribute to, or differ from, the mechanisms underlying habituation (Fotowat & Engert, 2023, eLife); what are the possible connections between the pallium and the hindbrain escape circuits that could relay these Rl signals (Kunst et al., 2019, Curr Biol)?

      (4) The authors make conservative claims associating the tiam2a+ cluster with Rl neurons activated by noxious stimuli, and their data support this conclusion. However, this link could be further strengthened by testing whether the tiam2a+ cluster shows differential responses to full vs partial looming. This could be achieved by performing pERK staining following the stimulus paradigm. While future tools may allow for direct functional imaging of this population, I believe such experiments are beyond the scope of this paper.

      (5) Figure 1E/J, Figure 3E/J: Please clarify whether the dashed red vertical lines indicate the onset or the offset of the stimuli. Additionally, different time windows were used for AUC calculations across these experiments; the authors should provide a rationale for these varying windows in the Results or Methods.

    1. eLife Assessment

      This important study uses an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to show that attentional orienting modulates conflict processing at both the semantic and response levels. The evidence is compelling, supporting the integration-segregation theory of exogenous attention in inhibition of return while also deepening our understanding of how attentional orienting shapes downstream cognitive processing. The work will therefore be of broad interest to researchers in attention and cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulate conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Comments on revisions:

      I appreciate the authors' thorough and thoughtful revisions, which have successfully addressed all of my prior concerns.

    3. Reviewer #2 (Public review):

      This study provides neuroimaging evidence supporting the integration-segregation theory of inhibition of return (IOR), a widely studied attentional phenomenon. It also explores the neural interactions between IOR and cognitive conflict, demonstrating that conflict processing is potentially modulated by attentional orienting.

      The integration-segregation theory was investigated using a sophisticated, well-executed experimental task that accounted for cognitive conflict processing, which is phenomenologically related to IOR but is non-spatial. The behavioral and neuroimaging data were carefully analyzed.

      The authors have thoughtfully addressed all my previous concerns. By demonstrating how attentional orienting can modulate neural processing of cognitive conflict, this study helps to advance a more unified and mechanistic understanding of the cognitive and neural processes that govern our visual perception and response selection.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention-a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic-algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is rigorous, with appropriate preprocessing, modeling, and converging analyses across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (e.g., FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (e.g., PHG, STG). Importantly, the findings provide much-needed neural support for the integration-segregation framework and clarify how IOR modulates conflict processing.

      Revisions and Evaluation:

      The authors have responded thoroughly and convincingly to the concerns raised in the previous round of review. In particular, issues related to the interpretation of dACC activity, the functional characterization of PHG and STG, and reporting clarity have been carefully addressed. The manuscript has been improved in terms of transparency, consistency of reporting, and overall readability.

      As a result, I no longer see any major weaknesses. The study is now clearly presented, methodologically sound, and theoretically informative. It makes a valuable contribution to the literature on attention and cognitive control.

      Comments on revisions:

      I appreciate the authors' efforts in addressing the previous comments. They have responded thoroughly to the concerns raised in the prior round of review. The work is well executed and makes a meaningful contribution to the field.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

      We appreciate the positive reception to our manuscript. In the revised manuscript, we have further clarified the logic underlying the task design, adopted a more cautious tone in interpreting the behavioral and neuroimaging results, and enhanced the transparency of reporting and presentation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

      We thank the reviewer for these helpful comments. In the revised manuscript, we have provided additional clarification regarding the SI and RI conditions (page 29), expanded the justification for the behavioral trial exclusion criteria (page 32), and clarified how the null condition was modeled and incorporated into the analyses (page 29). In addition, we have revised the description of the behavioral results to adopt more cautious wording, particularly given the absence of a significant interaction effect. For detailed responses to these specific points, please refer to the "Recommendations for the Authors" section below.

      Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      We thank the reviewer for raising these important points. In the revised manuscript, we have further clarified the rationale of the experimental design and the motivation for introducing cognitive conflict.

      First, we clarified that previous neuroimaging studies relied primarily on SOA-based contrasts, which capture the temporal dynamics of attentional orienting but do not directly distinguish the functional processes of integration and segregation. We therefore established the direct comparison between cued and uncued targets in the long SOA as the critical test required by the theory, as these conditions are hypothesized to engage integration and segregation processes, respectively (pages 6-7, “The Challenge of Neural Verification”). Crucially, to successfully implement this comparison, we highlighted the specific methodological advantage of our study: the use of a Genetic Algorithm (GA) to optimize the stimulus sequence. We explained how this design maximizes statistical power specifically for contrast detection (i.e., cued vs. uncued) while maintaining high estimation efficiency, thereby directly overcoming the power constraints that had likely obscured these subtle neural signatures in prior ER-fMRI work (pages 7-8).

      Second, we clarified that the manipulation of cognitive conflict was introduced with the additional aim of examining IOR expression mechanisms, specifically investigating how spatial attention modulates ongoing cognitive processing after target onset, rather than the generation of IOR itself. We have now provided a clearer rationale for embedding a modified Stroop task within the cue-target paradigm, and explained how this design allows us to dissociate semantic and response conflicts while avoiding methodological confounds present in previous studies (page 8).

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

      We thank the reviewer for pointing out the interpretational challenges in Figure 4. To address this, we have revised Figure 4 and provided a clearer and more precise interpretation of these interaction effects in the manuscript.

      First, we have added explicit panel titles to Figure 4 (page 17). Panel A is now clearly labeled as the “Effect of IOR on Semantic Conflict”, while Panel B is labeled as the “Effect of IOR on Response Conflict”. We hope this visual labeling helps readers clearly identify the IOR modulation effects specific to each conflict type.

      Second, we have revised the figure caption to explicitly define the interaction contrasts used to quantify these modulations, providing specific formulas (e.g., [UncuedRI – Uncued-SI] > [Cued-RI – Cued-SI] for response conflict) to ensure transparency.

      Finally, regarding the reviewer’s comment on “deactivation”, we realized that our original figure terminology (e.g., “IOR effect under...”) might have caused confusion by mixing the interaction effect with the IOR effect itself. We have clarified that Figure 4 specifically illustrates the “Effect of IOR on the Semantic Conflict and the Response Conflict” (i.e., interaction effect between IOR and cognitive conflict). To interpret this interaction, we further examined the simple effects of conflict under each cueing condition. Specifically, we analyzed the neural signatures of semantic conflict (SI minus NE) and response conflict (RI minus SI) separately for the cued and uncued targets. Importantly, regarding the nature of the IOR effect itself (as displayed in Figure 3, page 14), it is not simply a uniform deactivation. Instead, by directly comparing the cued and uncued conditions for the neutral words, we observed neural changes in two directions: some specific regions exhibited an increased activation (Cued > Uncued), while others showed a reduced activation (Uncued > Cued). These differential patterns involved distinct brain networks and corresponded to the distinct integration and segregation mechanisms, respectively, rather than a global loss of activation (pages 20-21).

      Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

      We thank the reviewer for the positive and constructive assessment of our study. In response to the concerns raised, we have carefully revised the manuscript and addressed all points in detail below. In brief, we have clarified key interpretation issues in the Discussion section, including the complementary roles of dACC activation and the distinction between statistical clustering and functional interpretation of PHG and STG activations (pages 20-21). We have also improved transparency and reporting throughout the manuscript by providing more detailed sample characteristics, clarifying exclusion criteria and global variance computation, adding illustrative supplementary figures, and standardizing statistical reporting and abbreviations (pages 28, 33). Finally, we have added a concise paragraph on limitations of the study to provide a more balanced interpretation of the findings (pages 26-27). Detailed, point-by-point responses to all specific comments are provided below (see the “Recommendations for the authors” Section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) The figure caption contains an unclear sentence (lines 195-196): "The target was a 450-ms colored Chinese character presented 600 ms after the fixation cue onset at the two target locations with equal probabilities." This description is ambiguous and should be revised for clarity.

      Thanks for pointing this out. In the revised manuscript, we have rephrased the figure caption to improve clarity as follows (pages 9-10):

      “Each trial started with a 150-ms non-informative cue presented at one of the two peripheral boxes. After a 150-ms interstimulus interval (ISI), a 150-ms fixation cue was presented at the central fixation box. Following a further 450-ms ISI, the target, a colored Chinese character, appeared at one of the two target locations with equal probabilities and remained on the screen for 450 ms. The trial ended with a variable intertrial interval (ITI) of 850, 1050, 1250, or 1450 ms (with equal probabilities).”

      (2) Please provide a more detailed and clearer description of the SI and RI experimental conditions in the Methods section.

      Thanks for this helpful suggestion. We have revised the Methods section to provide a more detailed description of the SI and RI conditions. Specifically, we have further described the stimulus-response mapping and clarified how the SI and RI conditions are defined based on whether the ink color and the character meaning fell into the same or different response categories under this mapping. In addition, we have added a clarification in the Methods section to make it clearer that the SI trials involved semantic conflict without response conflict, whereas RI trials involve both semantic and response conflicts (page 29).

      (3) As the data were collected across two research centers, please clarify the number of participants enrolled at each site.

      Thanks for this suggestion. We have now explicitly stated in the Apparatus and Data Acquisition section that 16 participants were enrolled at each site. The revised text reads (page 31):

      “The imaging data were acquired at two research sites following comparable protocols, with equal numbers of participants scanned at each site (n = 16 per site).”

      (4) In the behavioral data analysis, please provide the rationale or justification for the criteria used to exclude trials.

      Thanks for this comment. In the revised manuscript (page 32), we have clarified that reaction times (RTs) shorter than 150 ms were excluded as anticipatory responses, and RTs longer than 1,300 ms were excluded to limit the influence of unusually slow responses. These exclusion criteria are commonly adopted in RT research and were applied consistently across all conditions (Ratcliff, 1993; Whelan, 2008).

      (5) Given that the behavioral interaction effect was not statistically significant, the conclusion on lines 236-237, "These data clearly isolated the two distinct conflict effects in the Stroop effect, namely the semantic conflict (SI-NE difference) and the response conflict (RI-SI difference)" appears overstated and should be softened accordingly.

      We thank the reviewer for this important comment. We have clarified that our original statement was intended to highlight the successful isolation of conflict types based on the significant main effects of congruency (validating the task design), rather than implying a significant interaction effect. However, we agree that the original phrasing appeared unclear in this context. We have therefore revised the sentence to adopt a more cautious tone in the revised manuscript (page 12):

      “These data demonstrated typical Stroop interference effects (Veen & Carter, 2005) in both the semantic (SI-NE difference) and response conflicts (RI-SI difference).”

      (6) The statement on lines 281-282, "Although the IOR effect showed no effect on either the semantic conflict difference (SI-NE) or the response conflict difference (RI-SI) in the behavioral performance" lacks supporting statistical evidence. Please report the relevant test statistics.

      We appreciate the reviewer’s careful reading and note that the relevant statistical evidence was missing from the original manuscript. This has now been added in the revised version. Specifically, we examined the interactions between cue validity and semantic conflict (SI vs. NE) as well as between cue validity and response conflict (RI vs. SI). Neither interaction was significant (see revised Results for full statistics on page 12), supporting our original statement that cue validity did not modulate either conflict component in behavioral performance.

      (7) The manuscript mentions that a null condition (with no Chinese character presented) was included to increase statistical power for detecting differences across conditions. However, it is unclear how this null condition was actually used in the data analyses. Please clarify the role of the null condition in both the behavioral and neuroimaging analyses.

      Thanks for this comment. We regret that this was not sufficiently clear in the original manuscript. The null condition was included for neuroimaging purposes and was not used in the behavioral analyses, as no response was required in these trials. In the fMRI analyses, null trials served as the implicit baseline and were not modeled as regressors of interest. Task-related activities for all experimental conditions were therefore estimated relative to this null baseline, facilitating estimations of task-related responses in randomized event-related designs (Burock et al., 1998; Friston et al., 1999; Liu, 2004). We have clarified this point in the revised manuscript (page 29).

      References

      Burock, M. A., Buckner, R. L., Woldorff, M. G., Rosen, B. R., & Dale, A. M. (1998). Randomized event-related experimental designs allow for extremely rapid presentation rates using functional MRI. NeuroReport, 9(16), 3735-3739. https://doi.org/10.1097/00001756-199811160-00030

      Friston, K. J., Zarahn, E., Josephs, O., Henson, R. N. A., & Dale, A. M. (1999). Stochastic designs in event-related fMRI. NeuroImage, 10(5), 607-619. https://doi.org/10.1006/nimg.1999.0498

      Liu, T. T. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part II: design of experiments. NeuroImage, 21(1), 401-413. https://doi.org/10.1016/j.neuroimage.2003.09.031

      Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3), 510-532. https://doi.org/10.1037/0033-2909.114.3.510

      Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3), 475-482. https://doi.org/10.1007/BF03395630

      Reviewer #2 (Recommendations for the authors):

      (1) The paper is a bit too lengthy, with a lot of information that is hard for non-experts to grasp.

      We thank the reviewer for this comment. We realized that the Introduction was the most challenging section for general readers. In the revision, we refined the text in the Introduction for a better structure and more reader-friendly wording to improve readability. In addition, following the reviewer’s suggestion (Recommendation 4 below), we have added short subsection titles to the Introduction, Results, and Discussion sections to better organize the content and highlight the main ideas. We hope these revisions make the manuscript more accessible and easier for a broader audience to follow.

      (2) Please double-check the stats, as some of the results presented in the main text do not align well with the figures. Take Figure 2 as an example.

      We appreciate the reviewer’s concern and have double-checked all statistics. All the results are consistent between the figures and the main text. Take Figure 2 as an example (page 12), the perceived discrepancy probably was caused by the fact that the descriptive values reported in the main text are marginal means for the main effects (i.e., the overall average of one factor, collapsed over the other factor), whereas Figure 2 shows the mean for each Congruency × Cue Validity condition (i.e., simple effect).

      (3) The reasoning that the neuroimaging findings support the dissociation between integration and segregation needs to be improved.

      We thank the reviewer for this important comment. In the revised Discussion (pages 1921), we have strengthened the reasoning linking our neuroimaging findings to the dissociation between the integration and segregation processes. Specifically, we make it clear how the distinct activation patterns observed for the cued and uncued targets map onto the different functional demands proposed by the integration-segregation theory. The cued targets were theorized to recruit the frontoparietal attentional control networks, consistent with the re-engagement of an existing object file (integration). On the other hand, the uncued targets should engage the medial temporal and temporal association regions responsible for novelty detection and episodic encoding, consistent with the creation of a new object file (segregation). We hope the reviewer finds that the revision offers a clearer explanation of how the observed neural patterns are consistent with a dissociation between the integration and segregation processes.

      (4) Please use short section titles to organize the introduction, results, and discussion sections. For instance, the discussion section is a long chunk of text (almost 9 pages) and is pretty dense, making it hard to quickly grasp the ideas the authors want to convey.

      Thanks for this helpful suggestion. Following the reviewer’s recommendation, we have now added short subsection titles to the Introduction and Discussion sections to improve structure and readability. For the Results section, we have maintained and further refined the existing subheadings to ensure consistent organization.

      Reviewer #3 (Recommendations for the authors):

      I found this manuscript to be a timely and substantive contribution to the study of attention and cognitive neuroscience. To my knowledge, it provides the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention, a framework that has been influential in behavioral work for more than two decades but has lacked clear neural support. The study is conceptually well motivated, methodologically solid, and generally clearly reported. The findings differentiate neural substrates associated with integration and segregation processes and further show how inhibition of return (IOR) interacts with semantic and response conflicts at the neural level.

      The manuscript is well organized, the writing is mostly clear, and the progression from theory to hypotheses and methods is easy to follow. The combination of IOR with a modified Stroop paradigm is a clever choice that extends the theoretical scope of exogenous attention research. The use of an optimized event-related fMRI design based on a genetic algorithm is also a strength and reflects careful attention to design efficiency.

      The main results are internally consistent and theoretically meaningful. Integration related activity in the fronto-parietal attention network (including FEF, IPS, TPJ, and dACC) and segregation-related activity in medial temporal areas (PHG and STG) it well with the proposed framework, and the pattern of activations is coherent across analyses.

      Overall, I think this is a carefully executed study that offers much-needed neural evidence bearing on the integration-segregation theory of exogenous attention. I would recommend the following revisions.

      Suggestions:

      (1) In the Discussion (pp. ~17-18), dACC activation is described both in terms of general cognitive control demands and as reflecting a possible inhibitory bias toward the cued direction. It would help the reader if you could briefly indicate whether you see these as complementary (e.g., dual roles within the same region) or as more competing interpretations.

      We thank the reviewer for this helpful comment. We have clarified in the revised manuscript that dACC exerts general cognitive control demands and biasing against the cued direction are complementary rather than competing interpretations. Specifically, we described how the dACC is involved in both the cognitive control required for target integration and the inhibitory bias toward the cued location, thereby highlighting its dual roles within the same region. The revised section reads as follows (page 20):

      “Furthermore, the observed increase in the left dACC activity under the cued relative to the uncued condition likely reflected the engagement of cognitive control mechanisms (Botvinick et al., 2004; Chung et al., 2024; Mayer et al., 2012; Veen & Carter, 2005), particularly in resolving the conflict between the task-driven requirement of target integration and the reduced accessibility of the cue-initiated representation. In this context, the heightened activation of dACC may also reflect its role in fulfilling the inhibitory bias toward the cued location (Mayer et al., 2004) and discouraging inefficient integration attempts at a location marked as less relevant.”

      (2) In the Discussion, you could consider adding a short paragraph explicitly acknowledging a few limitations and how they might constrain generalization of the findings. A concise reflection of this kind would give a more balanced picture without undermining the main conclusions.

      We appreciate this helpful suggestion. In the revised manuscript, we have added a concise paragraph explicitly addressing a key limitation of the present study (pages 26-27). Specifically, we acknowledge that the absence of behavioral interactions alongside clear neural effects requires cautious interpretation. We discussed how this dissociation may reflect differences in measurement sensitivity between behavioral and neural indices, consistent with prior findings (Chen et al., 2006; Wilkinson & Halligan, 2004). We also note that the use of a GA-optimized sequence, while improving statistical efficiency, may have introduced unintended regularities in event order that could influence behavioral strategies.

      (3) Since the dataset is hosted on GitHub, adding a short note in the Data Availability section about whether the repository will also include analysis scripts or future replication data would further enhance transparency and long-term usefulness.

      Thanks for this helpful suggestion. We have revised the Data Availability section (page 35) to clarify that the GitHub repository contains the processed data used in the final analyses. Analysis scripts and additional materials for replication are available from the authors upon reasonable request.

      (4) In the Results section, the formatting of statistics is not fully consistent. For example, some reports use spaces around symbols (e.g., "η<sup>2</sup> = 0.301") whereas others do not (e.g., "p< .001"). It would be good to standardize this (e.g., "p < .001", "η<sup>2</sup> = .30") across the manuscript.

      Done as suggested.

      (5) A few abbreviations appear before they are defined-for instance, SPC (superior parietal cortex) shows up in the Results (response conflict section) before the full name is given. Ensuring that each abbreviation is defined at first mention would help readers who may be less familiar with all of the regional acronyms.

      Thanks for this comment. We have conducted a thorough check of the manuscript and ensured that all abbreviations are defined upon their first occurrence.

      (6) The text sometimes refers to "PHG/STG" as a combined cluster, while at other points, PHG and STG are described separately. It would be useful to clarify under what circumstances they are treated as a single functional cluster versus distinct regions of interest, and to keep the nomenclature as consistent as possible between the main text and the tables.

      Thanks for raising this point. In the revised manuscript, we have clarified this issue by distinguishing between statistical clustering and functional interpretation. In the whole brain analysis, activations in the left hemisphere formed a single continuous cluster spanning the PHG and STG; therefore, this cluster is labeled as “PHG/STG” in Table 1. We have explicitly noted the continuous nature of this cluster in the Results section (page 15) to ensure clarity:

      “Notably, in the left hemisphere, these activations formed a continuous cluster spanning both regions (labeled as PHG/STG in Table 1).”

      (7) It would be helpful to provide a bit more detail about the sample characteristics (e.g., age range, handedness, and inclusion/exclusion criteria) and to state explicitly how many participants, if any, were excluded from the analyses and for what reasons. This would help readers better evaluate data quality and generalizability.

      Thanks for this helpful suggestion. We have revised the Participants section (page 28) to provide the full details regarding our sample:

      “32 healthy participants with normal or corrected-to-normal vision and normal color vision were recruited. All participants were right-handed and reported no history of neurological or psychiatric disorders. Data from three participants were excluded due to excessive head movements and high global variances (see fMRI Data Analysis), leaving 29 participants for analysis (18 female, 11 male; aged 18-30 years, M = 22.69, SD = 2.58).”

      Furthermore, we have provided a clearer description of the exclusion criteria in the Data Analysis section (pages 33-34) as follows:

      “Runs with motions exceeding one voxel length in any direction were excluded (resulting in the exclusion of two runs) …Runs with global variance equal to or over 0.1% were excluded, resulting in the exclusion of eight runs (see Supplementary Information for details). Ultimately, three participants were excluded because neither run met the quality criteria. All remaining participants retained both runs, except for three individuals who each contributed only one valid run.”

      (8) Given that participants were excluded based on global variance exceeding 0.1%, it would be very informative to include, in the Supplementary Materials, an illustrative figure showing the signal time series (or global signal variance over time) for excluded participants.

      We appreciate this valuable suggestion. In the revised Supplementary Materials, we have included a new figure (Figure S2) that plots the global signal time series for the excluded runs to illustrate the signal patterns that led to their exclusion based on global variance.

      (9) Relatedly, it may help to more explicitly describe how global variance was computed (e.g., over which time window, after which preprocessing steps, and whether it was calculated on whole-brain signal or within specific masks). A concise clarification would make the exclusion criterion easier to interpret.

      Thanks for this helpful suggestion. We have now clarified in the manuscript how global variance was computed (page 33) and have also provided a more detailed description of the computation procedure in the Supplementary Materials (page 4). Specifically, after the standard preprocessing (slice timing correction, 3D motion correction, spatial smoothing, linear trend removal, and high-pass temporal filtering), the global signal was computed for each run as the mean signal across voxels with intensity values greater than 100 in each volume. Global variance was then quantified as the temporal variance of this run-wise global-signal time course across all volumes, providing a quality-control index of signal stability.

      (10) Rather than only reporting a single overall exclusion rate (e.g., 5.52% of total trials), it would be informative to break this down by source, reporting separately the proportion of trials excluded as RT outliers and the proportion excluded due to response errors. This would further improve transparency regarding the behavioral preprocessing pipeline.

      Thanks for this helpful suggestion. We have now broken down the overall exclusion rate by source in the revised manuscript. Specifically, we reported that 4.29% of trials were excluded due to incorrect responses, and 1.24% of trials were excluded as RT outliers (page 32).

      References

      Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: an update. Trends in Cognitive Sciences, 8(12), 539-546. https://doi.org/10.1016/j.tics.2004.10.003

      Chen, Q., Wei, P., & Zhou, X. (2006). Distinct neural correlates for resolving stroop conflict at inhibited and noninhibited locations in inhibition of return. Journal Of Cognitive Neuroscience, 18(11), 1937-1946. https://doi.org/10.1162/jocn.2006.18.11.1937

      Chung, R. S., Cavaleri, J., Sundaram, S., Gilbert, Z. D., Del Campo-Vera, R. M., Leonor, A., Tang, A. M., Chen, K.-H., Sebastian, R., Shao, A., Kammen, A., Tabarsi, E., Gogia, A. S., Mason, X., Heck, C., Liu, C. Y., Kellis, S. S., & Lee, B. (2024). Understanding the human conflict processing network: A review of the literature on direct neural recordings during performance of a modified stroop task. Neuroscience Research, 206, 1-19. https://doi.org/10.1016/j.neures.2024.03.006

      Mayer, A. R., Seidenberg, M., Dorflinger, J. M., & Rao, S. M. (2004). An event-related fMRI study of exogenous orienting: supporting evidence for the cortical basis of inhibition of return? Journal Of Cognitive Neuroscience, 16(7), 1262-1271. https://doi.org/10.1162/0898929041920531

      Mayer, A. R., Teshiba, T. M., Franco, A. R., Ling, J., Shane, M. S., Stephen, J. M., & Jung, R. E. (2012). Modeling conflict and error in the medial frontal cortex. Human Brain Mapping, 33(12), 2843-2855. https://doi.org/10.1002/hbm.21405

      Veen, V. V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. Neuro Image, 27(3), 497-504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Wilkinson, D., & Halligan, P. (2004). The relevance of behavioural measures for functional imaging studies of cognition. Nature Reviews Neuroscience, 5(1), 67-73. https://doi.org/10.1038/nrn1302

    1. eLife Assessment

      This important study offers insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is convincing, although including a larger sample size and more quantification would have strengthened the study, and the claims of monosynaptic connectivity would benefit from further experimental evidence. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers.]

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projections neurons add specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons.

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling

      The writing is clear and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recordedd neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

    4. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of previous transcriptomic analysis of ALS neurons, the authors identify calbindin as a marker for cold activatetd lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex-vivo physiology, optogenetics and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data, neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public reviews:

      Reviewer #1 (Public review):

      The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

      As noted in our initial resubmission, we fully accept that the sample size is limited. We have already toned down statements related to this, to say that our findings “strongly suggest” that the cells with dense Trpm8 input are cold-selective (both in the Abstract and Results)

      Reviewer #2 (Public review):

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

      Please see our response to Reviewer #1 (Public review):

      Reviewer #3 (Public review):

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

      Again, please see our response to Reviewer #1 (Public review):

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 609. The authors used the Trpm8Flp;RCE:FRT;Ai9 mice in some electrophysiological experiments. What is the function of the Ai9 allele (a Cre-dependent reporter) in this cross? Should not be a Cre line as well?

      One of the mice used for electrophysiological experiments was Trpm8Flp;RCE:FRT;Ai9, and this animal received an injection of AAV encoding Cre into the caudal ventrolateral medulla, resulting in tdTomato expression in spinal projection neurons. This part of the Methods was inadvertently omitted from the resubmitted version (see next point). This has been corrected, and in addition, this information is shown in the cartoon in Fig 4A and is explained in the figure legend.

      (2) Line 860. Phrase is incomplete

      We apologise for this – 3 lines from the original version had been deleted inadvertently. This has now been corrected.

      (3) Line 103 "These results are therefore consistent with the transcriptomic findings described above (36,37)."

      I would revise the references used to support this claim. Reference 37 is a transcriptomic atlas of the brain. I could not find TRPM8 expression data in DRG in this reference.

      Figure S4 of reference 37 deals with the mouse peripheral nervous system and describes Trpm8 classes of primary afferent. More detail on these cells (including expression of VGLUT3, Tac1, Calca and Trpv1) can be found in the associated website: mousebrain.org/adolescent/genesearch.html. We have therefore left this reference as it is.

      (4) Line 242. "neurons with dense Trpm8 input had significantly lower sEPSC frequencies compared to those that lacked dense Trpm8 input".

      This is an interesting paradox because cold thermoreceptors (i.e. the presumed direct monosynaptic input to these projection neurons) are known to be spontaneously active at physiological skin temperatures. This is well characterized in trigeminal corneal endings (DOI: 10.1038/nm.2264). In fact, the decrease in this spontaneous activity can be used by mice to faithfully detect warm stimuli (DOI: 10.1016/j.neuron.2020.02.035). This reviewer likes to remark that this low spontaneous frequency may be due to the non-physiological temperature of this preparations, leading to partial adaptation/desensitization of the afferents. Perhaps, it also influences the amplitude (e.g. release probability) of EPSPs (I do not expect you to do anything about my remark).

      These are interesting points, but we do not feel that we can add anything here.

      (5) Figure 3A. It would be useful to include orientation references (dorso-ventral, mediolateral) in the images. Same comment applies to Figure 5C.

      Since these are horizontal sections, the axes are medio-lateral and rostro-caudal. Corresponding orientation markers have been added to both figures.

      (6) Figure 3F. If I understood correctly, the light pulse used for optogenetic activation is delivered directly through the objective used for recording the cell. Thus, the distance between pre and postsynaptic neuron should be minimal. That being the case, I do not understand how a monosynaptic input can have a delay of 5 or 7 ms. Am I missing something?

      The relatively long duration of latency is likely to reflect a slow rise time of depolarisation in the Trpm8 terminals, so that although channels will open very rapidly, there is a delay until the boutons reach action potential threshold. Hachisuka et al (2016) recorded from Nts<sup>Cre;</sup>Ai32 mice (i.e. coding for channelrhodopsin) and found typical latencies of >5 ms (Fig 5E in that paper). We believe that this delay is exacerbated by the low levels of expression of ChR2 that we were able to achieve with the neonatal i.p. injection approach. We have provided a brief explanation for this, and cited the reference in the Results section (lines 197-198).

      (7) Figures 4E/H. To be meaningful, the pie charts should include the n (total number of neurons). See, for example figure 5J.

      Numbers have been added to the pie charts.

    1. eLife Assessment

      This manuscript presents a valuable analysis of how locomotion modulates the activity of different subtypes of cortical neurons in the mouse primary visual cortex, showing that locomotion more strongly increases responses in sensitizing than in depressing excitatory cells. This data is then used to constrain a model of the responses. While the data are very interesting, the analyses remain incomplete, in particular due to concerns surrounding the modelling.

    2. Reviewer #1 (Public review):

      In this manuscript, Hinojosa and colleagues analysed the changes in V1 visual responses induced by locomotion in head-fixed mice using two-photon calcium imaging. The authors observe that locomotion strongly increases the visual responses of V1 excitatory neurons that exhibit sensitizing responses to visual stimuli. Also, there is an increased response in VIP interneurons, and to a lesser extent, PV interneurons and SST interneurons (non-significant). The authors used a model fitted with data presented in the manuscript, as well as previous knowledge on cortical connectivity among different neuron types. The model suggests that the major component of the increased responses during locomotion is an increase in excitatory drive from external inputs (feedforward, feedback and modulatory), most importantly onto VIP interneurons and excitatory neurons. However, the excitatory drive of local excitatory neurons onto other surrounding excitatory and inhibitory cells is reduced.

      The manuscript is well presented and represents a valuable analysis of how locomotion modulates the activity of different subtypes of cortical neurons. However, major issues should be addressed to strengthen the results.

      Major issues:

      (1) Speed and mismatch between locomotion and visual stimulation.

      The authors do not clearly describe the definition of locomotion versus the resting state. The speed should, by itself, have an impact on neuronal responses, especially at the onset of locomotion. Several published studies show that the mismatch between a visual stimulus and the speed of the animal induces specific responses in V1, both in excitatory and subtypes of inhibitory neurons. The authors should address these points upfront in the manuscript, since it is likely a major variable explaining their results

      (2) Use of deconvolution with MLSpike.

      Some results (Figure 2) exclusively depend on the deconvolution of calcium signals into spikes (since the initial peak is not seen in calcium transients). The authors should validate this result either with electrophysiological recordings or with the use of another deconvolution method (e.g. CASCADE), emphasising the limitations of this approach and the limitations of the time resolution of calcium imaging.

      (3) The manuscript is centred around a specific increase in visual responses in sensitizing neurons during locomotion, both in the fraction of responsive neurons and response magnitudes.

      It is hard to tell whether this difference is due to a greater scaling effect of locomotion, a difference in responses during the resting state, or both. The manuscript should further explore and discuss the differences in responses between sensitizing and depressing neurons, both during the resting state and locomotion. Adding metrics and direct comparisons of the magnitudes of fast responses, slow responses, and time integrals between sensitizing and depressing neurons in resting and locomotion states would help to clarify this. Same for fractions of responsive neurons of each type in each condition. E.g., the slow phase is harder to judge from the plots, but the DeltaF/F integral shown in Figure 1G seems to suggest the difference in response magnitude between sensitizing and depressing neurons is largest in locomotion state, rather than resting state. How do these integrals look for inferred firing rates shown in Figure 2?

      (4) There is something counterintuitive about how the changes in inhibition onto sensitizing and depressing neurons during locomotion explain the reported activity changes.

      Sensitizers receive reduced SST input and increased PV input during locomotion. If SSTs depress and PVs sensitize (and this is the main reason why sensitizers, which receive dominant input from SSTs sensitize, and vice-versa), how is it possible that this switch does not alter the sensitizing or depressing nature of these neurons' responses in locomotion? Are these changes insufficient to flip the dominant SST-PV drive? Figure 6D-E seems to show there is a flip, at least for sensitizers. How do authors explain this? Do authors think this is related to the narrowing of the adaptive index distribution shown in Figure 1C?

      (5) Presentation of the experimental data and the model.

      The manuscript introduces the results of interneuron recordings during the description of the model. Similarly, the results of optogenetic manipulations are presented inside the model's description. It would be clearer to present all experimental data first and introduce the model later, fitting it to all experimental evidence previously presented.

    3. Reviewer #2 (Public review):

      This is an interesting paper with important results. The authors, working in V1, have previously, in a 2022 paper, defined sensitizing and depressing excitatory (E) cells as those whose response increases or decreases, respectively, across the 10 seconds of showing a drifting grating stimulus. They showed that sensitizing E cells are dominantly inhibited by SST inhibitory cells, which are dominantly depressing, and that depressing E cells are dominantly inhibited by PV inhibitory cells, which are very largely sensitizing. It's been well established that locomotion greatly increases E-cell firing rates in V1 compared to rest, but much remains to be worked out as to the mechanism. Here, they find that locomotion increases the responses of the sensitizing E cells much more than depressing cells. They develop a model of changes in synaptic weights between rest and locomotion to account for the changes. One reason that sensitizers are increased more by locomotion than depressors is that PV cells, which more strongly inhibit depressors, have increased firing for locomotion, whereas SST cells, which more strongly inhibit sensitizers, don't change their firing rates with locomotion. However, in the mode,l a complex array of postulated changes in connection strengths is also involved.

      I have, though, a number of concerns: with the model, with the lack of proper discussion of connection to some previous works, and with an overall unclear and confusing presentation and certain controls that should be done.

      In the model, they postulate that synapses within the 6-cell-type network - sensitizing, intermediate, and depressing E cells, and PV, SST, and VIP I cells - and from three sources of external input to each of the six types all change between rest and locomotion (except that connections between the E cells don't depend on their types). There are a lot of degrees of freedom, and this makes interpretation of the results difficult. I would have liked to have seen more efforts to constrain the degrees of freedom. For example, there seems to be very little difference between the three E cell types in any of the three types of external input received. Why not constrain them all to get the same external input and see if it significantly affects model fit? Or what if synapses from the three types of external input are left unchanged, and only change their strengths between rest and locomotion? How well could this do? During optimization, why not constrain the changes between rest and locomotion, for example, by putting an L1 penalty on the changes or the relative changes, trying to force them to be sparse, and see whether there are roughly equally good fits? And then, if the main changes are in a small set of synapses, can the authors isolate changes to that small set and do roughly equally well? What about looking at the principal components of the weight changes across models, to isolate patterns of change that are most important?

      In terms of comparing to previous works, when optogenetic manipulations of SST and PV are done to test various hypotheses, I would like to see some discussion of what is already known from the authors' 2022 paper and what they are adding or testing that wasn't known or tested from that paper. And Dipoppa et al (2018) also found weight changes to account for the difference between rest and locomotion. They were looking at a fixed point of responses of neurons across retinotopic space to stimuli of various sizes with only one E-cell type, whereas they are accounting for trajectories across time considering 3 E-cell subtypes but without variation in stimuli or retinotopic position of neurons, so the efforts are somewhat different, but still, it would be good to see a bit more discussion of what is in agreement or in contradiction in the conclusions.

      In terms of presentation and controls, I have many concerns, which include:

      (1) The main result is that sensitizers increase their responses with locomotion ~2X (for dF/F) or about 3.5X (for spikes) more than depressors. But there are other differences between sensitizers and depressors, for example sensitizers have smaller initial stimulus responses at rest, and depressors have larger. What if cells were divided into tertiles by initial stimulus response at rest? Would the authors see the same differences in the effects of locomotion? If so, can they establish whether the difference is really attached to the adaptation properties rather than to, for example, the initial responses, for example, by comparing the regression of response increase against AI vs the regression of response increase against initial resting response? And there might be other controls to be done for other features in which sensitizers and depressors differ.

      (2) Lines 103 and following: the authors refer to a "second notable change" which is the narrower distribution of adaptive effects, but I think this is trivial. The adaptive index is AI=(R1-R2)/(R1+R2), where R1 is response 0.5-2.5s after stimulus onset and R2 over 8-10s. But if the change is additive, as suggested by the dF/F figures (and I believe the distributions of AI here are based on dF/F measurements) -- adding the same constant to R1 and R2 will shrink |AI| without changing the sign of AI. So this would seem to just be a signature of a change that is primarily additive rather than multiplicative.

      Also, if the authors do decide that they are going to focus on spikes after showing the raw dF/F, then this analysis should be repeated for spikes.

      (3) Figure 2, F is supposed to be D minus E, but it doesn't look like it. For example, the initial response under locomotion is very similar in sensitizers and depressors, so the initial difference in F should be small, but it's not; and at rest, depressors initially have larger responses than sensitizers, whereas later depressors have smaller responses than sensitizers, yet the difference at rest is positive at all times. Something seems wrong here.

    4. Reviewer #3 (Public review):

      This study aimed to understand the depressing and sensitizing effects of adaptation in mice visual cortex during different behavioral states: locomotion and stationary. There is an impressive characterisation of the responses in different cortical cell types and with different optogenetic manipulations to the inhibitory populations. These form a very interesting dataset to understand the effects of the state on the circuits and gain insight into the mechanisms. This data is then used to constrain a model of the responses. Unfortunately, the model appears to be too flexible, and it was difficult to interpret the insights gained from the different model fits.

      Strengths:

      The data is impressive. There is a characterisation of responses of PCs and VIP, SST and PV interneurons. Additionally, there is the characterisation of some responses to specific optogenetic manipulations, VIP inactivation, SST or PV activation or inactivation. These data will help develop a good insight into the system. The principle of using the optigenetic manipulations to constrain model parameters is very interesting.

      Weaknesses:

      Many of the analyses have some concerns in the methodology used, which we list in detail below. Further, the model used to gain insight into the mechanism appears overly complicated and seems hard to gain clear insights from.

      Major concerns:

      (1) Key concern is the usage of dF/F signals for all analyses, especially when comparing responses.

      1a) Figure 1G: Comparison of sensitisers and depressors. It is important to consider what the baseline rates are when making these comparisons, especially when comparing the degree of effects between different cell types. For example, if baseline rates for sensitizers were overall higher, it would mean the difference in gain of response would be lower, and could affect the results in the opposite direction of what is claimed. One option to account for this would be to z-score the overall responses, using the same normalization for locomotion and rest. We also suggest plotting differences in sensitisers, intermediates, and depressors as a function of firing rate. Matching for firing rate across each PC categorization and calculating delta AI for each matched firing rate bin.

      1b) Figure 2A-F: The above is an even more significant issue when it comes to estimating spiking rates. The methods do not state how dF/F is calculated. If these are based on using the pre-stim as the reference, the algorithms for spike rate used might not be appropriate if this were used. Using pre-stimulus referencing could result in the estimate going into the wrong range in the calculation of the spike rate.

      1c) In both cases above, it could be a problem if baseline firing rates are different between cell types, or states (locomotion/stationary). The latter is established to have effects on many cell types measured, and so needs to be accounted for very carefully.

      1d) It would be informative to see per-neuron comparison for adaptive indices during rest and locomotion states. This could be visualized using a scatter plot with AI-rest vs. AI-locomotion for Figures 1D- 1F and 2J- 2L.

      1e) Are neurons more strongly modulated between locomotion and rest, also more likely to experience a shift in AI indices (i.e. delta AI). Is there a correlation between the change in firing rate between behavioral states and Delta AI (Loco-Rest)? If so, is this present for all neuron subtypes (e.g. VIP, SST, and PV)?

      1f) Optogenetic inhibition of VIP neurons on average abolished the slow depressive effects of adaptation in SST (Figure 3). The strength and prevalence of this effect are unclear. Perhaps one can perform a bootstrap control and opto AI indices and calculate whether AI was significantly reduced following optogenetics inhibition, and if so, on average, how likely was this to occur for the recorded SST neurons? This is important in knowing that the average effects (Figure 3D) aren't driven by a portion of SST neurons, especially as this is later used to confirm the region of parameter space and affects the subsequent results in Figure 4.

      (2) Statistics for the effects. There is a mention of Liner mixed models, but no information is given on the actual models being used and tested. This is particularly for the case of Figure 1G, where there is a composition of effect sizes between different populations. What precise significance test is being used? Are the stats on paired cells when considering locomotion and rest?

      (3) Model parameters: It is acknowledged that there is a large range of parameters that can model the responses effectively, up to 11% of initial conditions. At 9000 initial conditions, this is around 1000. The parameter estimates are then considered as the mean of each parameter. This seems like a strange choice for a few different reasons:

      3a) A mean solution might not be one of the solutions. Let's say the parameters range over a large dimensional space. They could occupy non-overlapping / discontinuous subspaces. In that case, the mean parameters do not necessarily fall within the solution subspaces. Therefore, this reduction to means might not be valid.

      3b) Compare distributions rather than means. There are multiple distributions of parameters between conditions. All stats should be on the comparison of distributions rather than just the means.

      (4) Visualizing weight matrices: It is very challenging to interpret the weight matrices. Furthermore, it appears that the stationary and locomotion conditions fit independently, and given the large parameter spaces, it is even harder to interpret. Can the fitting instead be done by fitting on one and using those at the initial conditions for the other state? Figure 7 shows an initiative cartoon, but it is not clear how the matrices in Figures 5 and 6 lead to the summary shown in Figure 7. It is also not clear why the connections between inhibitory neurons are not shown in Figure 7. One option is to perhaps run some kind of dimensionality deduction on the parameter space to better interpret the data. When showing deltaWeights, was the model initialised with 'Rest' weights and allowed to change? It is not obvious what the difference is between 'relative change in connection weights' and 'relative change in synaptic weights'.This needs to be clarified.

      4a) Model parameters were reduced differently for locomotion and rest (Figure 4). We suggest evaluating the results for locomotion and rest using the same chi-square value of 3 for both behavioral states (at least in controls).

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Speed and mismatch between locomotion and visual stimulation.

      The authors do not clearly describe the definition of locomotion versus the resting state. The speed should, by itself, have an impact on neuronal responses, especially at the onset of locomotion. Several published studies show that the mismatch between a visual stimulus and the speed of the animal induces specific responses in V1, both in excitatory and subtypes of inhibitory neurons. The authors should address these points upfront in the manuscript, since it is likely a major variable explaining their results.

      We will clarify in the methods that a trial was considered as locomotion when an animal ran at a minimum of 3 cm/s for at least 80% of the 10 s stimulus presentation, and was considered rest when running under 3 cm/s during the same fraction of time. Trials with abrupt changes from locomotion to rest were rare and excluded following these criteria.

      Locomotion speed and visuomotor mismatch can influence neuronal responses in V1 but in the large majority of our trials mice either run continuously at a stable speed or remained still

      i.e locomotion onsets or offsets did not occur (see Hinojosa et al. 2026 for example running traces). Furthermore, sensitizing and depressing neurons were typically recorded simultaneously within the same field of view, experiencing identical locomotor behaviour. For these reasons, we think it is unlikely that differences in speed or mismatch alone can account for the different increase in amplitude observed between depressors and sensitizers.

      To directly address this point and further explore the role of speed on V1 neurons, we will quantify the relationship between running speed and amplitude increase in both PCs and interneurons, and include these analyses in the revised version of the manuscript.

      (2) Use of deconvolution with MLSpike.

      Some results (Figure 2) exclusively depend on the deconvolution of calcium signals into spikes (since the initial peak is not seen in calcium transients). The authors should validate this result either with electrophysiological recordings or with the use of another deconvolution method (e.g CASCADE), emphasising the limitations of this approach and the limitations of the time resolution of calcium imaging.

      A similar initial increase in amplitude followed by fast depression has been observed previously with electrophysiological recordings in V1 (Chance et al., 1998; Jin & Glickfeld, 2020; Varela et al., 1997). We will further validate our results using an alternative spike inference method like CASCADE (Rupprecht et al., 2021), as well as expanding on the limitations of our approach.

      (3) The manuscript is centred around a specific increase in visual responses in sensitizing neurons during locomotion, both in the fraction of responsive neurons and response magnitudes.

      It is hard to tell whether this difference is due to a greater scaling effect of locomotion, a difference in responses during the resting state, or both. The manuscript should further explore and discuss the differences in responses between sensitizing and depressing neurons, both during the resting state and locomotion. Adding metrics and direct comparisons of the magnitudes of fast responses, slow responses, and time integrals between sensitizing and depressing neurons in resting and locomotion states would help to clarify this. Same for fractions of responsive neurons of each type in each condition. E.g., the slow phase is harder to judge from the plots, but the DeltaF/F integral shown in Figure 1G seems to suggest the difference in response magnitude between sensitizing and depressing neurons is largest in locomotion state, rather than resting state. How do these integrals look for inferred firing rates shown in Figure 2?

      We will further explore the response dynamics of adaptive types within the locomotion and resting state, highlighting the differences between calcium signals and inferred spikes. We will then include our findings in the new version.

      (4) There is something counterintuitive about how the changes in inhibition onto sensitizing and depressing neurons during locomotion explain the reported activity changes.

      Sensitizers receive reduced SST input and increased PV input during locomotion. If SSTs depress and PVs sensitize (and this is the main reason why sensitizers, which receive dominant input from SSTs sensitize, and vice-versa), how is it possible that this switch does not alter the sensitizing or depressing nature of these neurons' responses in locomotion? Are these changes insufficient to flip the dominant SST-PV drive? Figure 6D-E seems to show there is a flip, at least for sensitizers. How do authors explain this? Do authors think this is related to the narrowing of the adaptive index distribution shown in Figure 1C?

      This result is only counterintuitive if we consider exclusively the internal connections within V1. The PV:SST ratio changes from 0.9 during rest, dominated by SST induced sensitization, to 1.2, dominated by PV depression. Although adaptation is strongly driven by the opposing inhibition of PV and SST in PCs during locomotion, its origin is more easily explained by an external input (SS) that targets VIPs, PVs and PCs. As a result, when locomotion increases the drive coming from SS input, it injects a source of sensitization that partly balances the decrease in PV:SST ratio, preventing a switch in their adaptive properties which, although reduced, remain sensitizing. We will include these calculations in the revised version.

      (5) Presentation of the experimental data and the model.

      The manuscript introduces the results of interneuron recordings during the description of the model. Similarly, the results of optogenetic manipulations are presented inside the model's description. It would be clearer to present all experimental data first and introduce the model later, fitting it to all experimental evidence previously presented.

      We understand that a clear separation between experimental and modelling results is often preferred in papers that combine these approaches but in our case modelling and experimental data are highly interdependent and we believe that an overlapping presentation make it easier for the reader to appreciate the links. One example is Fig. 2G-L that shows experimental results validating a key feature of the model - the use of average response dynamics for each population of interneuron. Similarly, the results in Fig. 3 validate the use of the VIP response dynamics as the template for the slow modulatory input to layer 2/3. Then the results of optogenetic experiments in Fig. 4 are used to narrow down fits to the model. For these reasons, we have chosen to present experimental results and the model in this more integrated manner.

      Reviewer #2 (Public review):

      In the model, they postulate that synapses within the 6-cell-type network - sensitizing, intermediate, and depressing E cells, and PV, SST, and VIP I cells - and from three sources of external input to each of the six types all change between rest and locomotion (except that connections between the E cells don't depend on their types). There are a lot of degrees of freedom, and this makes interpretation of the results difficult. I would have liked to have seen more efforts to constrain the degrees of freedom. For example, there seems to be very little difference between the three E cell types in any of the three types of external input received. Why not constrain them all to get the same external input and see if it significantly affects model fit? Or what if synapses from the three types of external input are left unchanged, and only change their strengths between rest and locomotion? How well could this do? During optimization, why not constrain the changes between rest and locomotion, for example, by putting an L1 penalty on the changes or the relative changes, trying to force them to be sparse, and see whether there are roughly equally good fits? And then, if the main changes are in a small set of synapses, can the authors isolate changes to that small set and do roughly equally well? What about looking at the principal components of the weight changes across models, to isolate patterns of change that are most important?

      To reduce the number of degrees of freedom and ease interpretation we did limit the model fitting for adaptive subtypes by fixing the PC-PC (𝑤<sub>𝑃𝐶_𝑃𝐶</sub>) and restricting the external inputs weights (𝑤<sub>𝐹𝐹_𝑃𝐶</sub>, 𝑤<sub>𝑆𝑆_𝑃𝐶</sub>, 𝑤<sub>𝐹𝐵_𝑃𝐶</sub>) to changes of ± 10 %. We will explicitly explain these constrains in the methods and discuss its limitations.

      We thank the reviewer for their suggestions of testing different conditions to find those providing the best fit for sensitizing and depressing PCs. We tried an approach similar to that described by Dipoppa et al. 2018 by using the locomotion weights as initial conditions for the rest traces and introducing penalties at later stages. However, the local optimization algorithms failed to reach distant regions of parameter space containing minimum solutions for the rest condition. We finally opted for repeating the same process of initial condition searching for locomotion and rest, making the L1 penalty approach impracticable in our case. We believe this approach is effective because it has both allowed us to describe circuit changes during internal-state transitions (the present paper) and, more recently, it has made a series of predictions about different learning states that have been confirmed by optogenetic tests (Hinojosa et al., 2026). We will nevertheless explore this and other of the reviewer suggestions to further optimize the fitting in the revised manuscript.

      In terms of comparing to previous works, when optogenetic manipulations of SST and PV are done to test various hypotheses, I would like to see some discussion of what is already known from the authors' 2022 paper and what they are adding or testing that wasn't known or tested from that paper. And Dipoppa et al (2018) also found weight changes to account for the difference between rest and locomotion. They were looking at a fixed point of responses of neurons across retinotopic space to stimuli of various sizes with only one E-cell type, whereas they are accounting for trajectories across time considering 3 E-cell subtypes but without variation in stimuli or retinotopic position of neurons, so the efforts are somewhat different, but still, it would be good to see a bit more discussion of what is in agreement or in contradiction in the conclusions.

      Thanks for this prompt. We will add further discussion of this work in light of the Heintz et al. (2022) and Dipoppa et al. (2018) papers.

      (1) The main result is that sensitizers increase their responses with locomotion ~2X (for dF/F) or about 3.5X (for spikes) more than depressors. But there are other differences between sensitizers and depressors, for example sensitizers have smaller initial stimulus responses at rest, and depressors have larger. What if cells were divided into tertiles by initial stimulus response at rest? Would the authors see the same differences in the effects of locomotion? If so, can they establish whether the difference is really attached to the adaptation properties rather than to, for example, the initial responses, for example, by comparing the regression of response increase against AI vs the regression of response increase against initial resting response? And there might be other controls to be done for other features in which sensitizers and depressors differ.

      We will explore the possibility that initial response influences the increase in amplitude. Preliminary data suggest that initial amplitude is higher in depressors than in sensitizers.

      (2) Lines 103 and following: the authors refer to a "second notable change" which is the narrower distribution of adaptive effects, but I think this is trivial. The adaptive index is AI=(R1-R2)/(R1+R2), where R1 is response 0.5-2.5s after stimulus onset and R2 over 8-10s. But if the change is additive, as suggested by the dF/F figures (and I believe the distributions of AI here are based on dF/F measurements) -- adding the same constant to R1 and R2 will shrink |AI| without changing the sign of AI. So this would seem to just be a signature of a change that is primarily additive rather than multiplicative.

      Also, if the authors do decide that they are going to focus on spikes after showing the raw dF/F, then this analysis should be repeated for spikes.

      We agree with the reviewer and will change the text accordingly to highlight the additive nature of the change in amplitude. We will also show the analysis with spikes (this shows similar results as the calcium data).

      (3) Figure 2, F is supposed to be D minus E, but it doesn't look like it. For example, the initial response under locomotion is very similar in sensitizers and depressors, so the initial difference in F should be small, but it's not; and at rest, depressors initially have larger responses than sensitizers, whereas later depressors have smaller responses than sensitizers, yet the difference at rest is positive at all times. Something seems wrong here.

      We apologize for the confusion this has caused. Figure 2F does not represent the difference between sensitizing and depressing PCs from panels D and E. Instead, it shows the time-varying difference between locomotion and rest states of sensitizers (blue, in figure 2D) and depressors (green, in figure 2E). Thus, panel F shows within-population modulation by behavioural state, rather than differences between sensitizing and depressing neurons. We will amend the figure legend and main text to explain this point and avoid misinterpretation.

      Reviewer #3 (Public review):

      (1) Key concern is the usage of dF/F signals for all analyses, especially when comparing responses.

      (1a) Figure 1G: Comparison of sensitisers and depressors. It is important to consider what the baseline rates are when making these comparisons, especially when comparing the degree of effects between different cell types. For example, if baseline rates for sensitizers were overall higher, it would mean the difference in gain of response would be lower, and could affect the results in the opposite direction of what is claimed. One option to account for this would be to z-score the overall responses, using the same normalization for locomotion and rest. We also suggest plotting differences in sensitisers, intermediates, and depressors as a function of firing rate. Matching for firing rate across each PC categorization and calculating delta AI for each matched firing rate bin.

      (1b) Figure 2A-F: The above is an even more significant issue when it comes to estimating spiking rates. The methods do not state how dF/F is calculated. If these are based on using the pre-stim as the reference, the algorithms for spike rate used might not be appropriate if this were used. Using pre-stimulus referencing could result in the estimate going into the wrong range in the calculation of the spike rate.

      (1c) In both cases above, it could be a problem if baseline firing rates are different between cell types, or states (locomotion/stationary). The latter is established to have effects on many cell types measured, and so needs to be account ted for very carefully.

      The DF/F0 trace was calculated using the mode of the whole trace as F0. While this approach is less sensitive to biases than subtracting the pre-stimulus, it does not consider noise levels like the z-score suggested by the reviewer. We will, therefore, normalize the calcium traces to z-score to further account for changes in the baseline. Spike inference using MLSpike, however, explicitly models baseline noise and subtracts its effect from that of the spikes calculated from the calcium signal (Deneux et al., 2016). This transformation preserved the difference in amplitude triggered by locomotion between depressing and sensitizing PCs while revealing their similar baseline activity (see Figs. 2D,E and F). These results indicate that the distinct changes in response amplitude between sensitizing and depressing PCs during locomotion are not driven by baseline differences. We will add this explanation to the methods section.

      We will also plot the changes in activity with locomotion across cell types as a function of firing rate and add these results to the revised manuscript.

      (1d) It would be informative to see per-neuron comparison for adaptive indices during rest and locomotion states. This could be visualized using a scatter plot with AI-rest vs. AI-locomotion for Figures 1D- 1F and 2J- 2L.

      (1e) Are neurons more strongly modulated between locomotion and rest, also more likely to experience a shift in AI indices (i.e. delta AI). Is there a correlation between the change in firing rate between behavioral states and Delta AI (Loco-Rest)? If so, is this present for all neuron subtypes (e.g. VIP, SST, and PV)?

      Sorting was carried out separately on locomotion and rest data sets to capture the adaptive properties of the network under each condition. When assessing the change in adaptive index in individual cells there was a weak but significant correlation (r = 0.10, p<0.05), probably due to trial to trial stochasticity in the network which has been shown to be present in V1 (Carandini, 2004; Lee et al., 2010). Although adaptation profiles of individual PCs are not fully conserved across rest and locomotion, the observed overlap exceeds that expected by chance, suggesting that stochastic fluctuations modulate an underlying, stable circuit organization. Despite including the stochastic component of the responses, the conclusions hold: sensitizers undergo a larger gain modulation than that of depressors. We will include this analysis and the correlation between change in firing rate and Delta AI in the revised version of the paper.

      (1f) Optogenetic inhibition of VIP neurons on average abolished the slow depressive effects of adaptation in SST (Figure 3). The strength and prevalence of this effect are unclear. Perhaps one can perform a bootstrap control and opto AI indices and calculate whether AI was significantly reduced following optogenetics inhibition, and if so, on average, how likely was this to occur for the recorded SST neurons? This is important in knowing that the average effects (Figure 3D) aren't driven by a portion of SST neurons, especially as this is later used to confirm the region of parameter space and affects the subsequent results in Figure 4.

      The strength and prevalence of the effect are reflected in the distribution of AI changes across SST neurons, which is centred at AI = -0.3 ± 0.3, indicating a consistent reduction in AI across the population instead of being driven by a small portion of SST neurons. To further clarify this, we will report the proportion of SST neurons showing a reduction in AI and include statistical analyses on the changes.

      (2) Statistics for the effects. There is a mention of Liner mixed models, but no information is given on the actual models being used and tested. This is particularly for the case of Figure 1G, where there is a composition of effect sizes between different populations. What precise significance test is being used? Are the stats on paired cells when considering locomotion and rest?

      We used Linear mixed models to test for statistical significance between different conditions composed of hundreds of cells from several mice, i.e. nested analysis (cells nested within mice; see (Judd et al., 2017)). For analyses such as Fig. 1G, we considered locomotion state, adaptive type and their interaction (loco’adap) as fixed effects and mouse number as the random effect. The p-values depicted in the legend indicates the interaction between locomotion and adaptive type, i.e. the increase in amplitude during locomotion is significantly different in sensitizers compared to depressors with p < 0.0001. We will revise the method section and figure legends to explicitly describe the model and statistical test used.

      (3) Model parameters: It is acknowledged that there is a large range of parameters that can model the responses effectively, up to 11% of initial conditions. At 9000 initial conditions, this is around 1000. The parameter estimates are then considered as the mean of each parameter. This seems like a strange choice for a few different reasons:

      (3a) A mean solution might not be one of the solutions. Let's say the parameters range over a large dimensional space. They could occupy non-overlapping / discontinuous subspaces. In that case, the mean parameters do not necessarily fall within the solution subspaces. Therefore, this reduction to means might not be valid.

      (3b) Compare distributions rather than means. There are multiple distributions of parameters between conditions. All stats should be on the comparison of distributions rather than just the means.

      To test for the presence of subsets of solutions grouped around different parameter values we plotted the distribution of each parameter across all the good solutions found. Most of the weights were a gaussian distribution centred around the mean and, most importantly, none of them had two peaks. Furthermore, after computing the mean weight values we plotted the solutions given by them in the model, and it rendered a good fit as shown in the figures. We will include those distributions in the new version and base the overall comparison on these distributions.

      (4) Visualizing weight matrices: It is very challenging to interpret the weight matrices. Furthermore, it appears that the stationary and locomotion conditions fit independently, and given the large parameter spaces, it is even harder to interpret. Can the fitting instead be done by fitting on one and using those at the initial conditions for the other state? Figure 7 shows an initiative cartoon, but it is not clear how the matrices in Figures 5 and 6 lead to the summary shown in Figure 7. It is also not clear why the connections between inhibitory neurons are not shown in Figure 7. One option is to perhaps run some kind of dimensionality deduction on the parameter space to better interpret the data. When showing deltaWeights, was the model initialised with 'Rest' weights and allowed to change? It is not obvious what the difference is between 'relative change in connection weights' and 'relative change in synaptic weights'.This needs to be clarified.

      Thanks for raising this concern. We will firstly try to make the weight matrices clearer to interpret.

      Regarding the fitting of rest and locomotion conditions, we fitted the locomotion traces first and used those solutions as initial conditions for the rest traces. However, this rendered no good solutions as minimums in the parameter space were too far from the initial starting points. We opted, therefore, for repeating the same process of initial condition searching for locomotion and rest. This approach is less biased in satisfying our aim of finding solutions that fit the data and can explain their dynamics, which are different for each condition. We believe this approach is effective, as not only has it allowed us to describe circuit changes during internal-state transitions but has also made a series of predictions under different learning states that were confirmed by optogenetic tests (Hinojosa et al., 2026).

      We simplified Fig. 7 for clarity but we will make it more accurate and explain it more in detail in the legend, including connections between interneurons.

      Interpreting high-dimensional parameter spaces can be challenging. In this study, we focused on low-dimensional summaries of the parameter space (e.g., average connection weights and their distributions across populations), which revealed consistent and interpretable differences between sensitizing and depressing neurons. Importantly, our conclusions do not rely on individual parameter values, but rather on systematic differences across populations that are robust across solutions. Additionally, we ran clustering analysis and found that there is no parameter that can be removed. We focused, therefore, on the larger and more robust differences. We will explore additional dimensionality reduction approaches and include these results if they provide further insight beyond the current analyses.

      Finally, the change in weights was calculated with equation 4, in which the weight from locomotion and rest, obtained through independent fits, were used to calculate the relative change from rest to locomotion. These were either connection weights (equation 2) which consider the strength of the connection between cell j and i, or synaptic weights (equation 3) which express the weight of individual synapses by dividing connection weights by the number of presynaptic cells and probability of connection. This distinction arises because we used average traces from all the neurons imaged to fit the model, requiring considering the number of cells to know the strength of individual synapses. We will add this explanation in the results and methods sections.

      (4a) Model parameters were reduced differently for locomotion and rest (Figure 4). We suggest evaluating the results for locomotion and rest using the same chi-square value of 3 for both behavioral states (at least in controls).

      Thank you for this prompt, this is an important point that we tried to resolve during our analysis. We used the reduced chi-square () to evaluate model fits within locomotion and rest condition independently. As defined in equation 12, reduced chi-square is inversely proportional to the standard error of the data which is higher in the rest dataset. As a consequence, setting the same threshold across conditions would not correspond to an equivalent goodness-of-fit criterion, and would impose a disproportionately strict constraint on the condition with lower variability, where deviations between model and data are more heavily penalized. For this reason, we used condition specific thresholds to ensure comparable fit quality relative to the noise level in each condition. In addition, to enable direct comparison across conditions independent of their noise levels, we used the RMSE as a complementary metric.

      References

      Carandini, M. (2004). Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS Biol, 2(9), E264. https://doi.org/10.1371/journal.pbio.0020264

      Chance, F. S., Nelson, S. B., & Abbott, L. F. (1998). Synaptic Depression and the Temporal Response Characteristics of V1 Cells. The Journal of Neuroscience, 18(12), 4785–4799. https://doi.org/10.1523/JNEUROSCI.18-12-04785.1998

      Deneux, T., Kaszas, A., Szalay, G., Katona, G., Lakner, T., Grinvald, A., Rózsa, B., & Vanzetta, I. (2016). Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo. Nature Communications, 7(1), 12190. https://doi.org/10.1038/ncomms12190

      Dipoppa, M., Ranson, A., Krumin, M., Pachitariu, M., Carandini, M., & Harris, K. D. (2018). Vision and Locomotion Shape the Interactions between Neuron Types in Mouse Visual Cortex. Neuron, 98(3), 602–615.e608. https://doi.org/10.1016/j.neuron.2018.03.037

      Heintz, T. G., Hinojosa, A. J., Dominiak, S. E., & Lagnado, L. (2022). Opposite forms of adaptation in mouse visual cortex are controlled by distinct inhibitory microcircuits. Nature Communications, 13(1), 1031. https://doi.org/10.1038/s41467-022-28635-8

      Hinojosa, A. J., Dominiak, S. E., Kosiachkin, Y., & Lagnado, L. (2026). Distinct Disinhibitory Circuits Link Short-Term Adaptation to Familiarity and Reward Learning in Visual Cortex. bioRxiv, 2026.2003.2024.713929. https://doi.org/10.64898/2026.03.24.713929

      Jin, M., & Glickfeld, L. L. (2020). Magnitude, time course, and specificity of rapid adaptation across mouse visual areas. J Neurophysiol, 124(1), 245–258. https://doi.org/10.1152/jn.00758.2019

      Judd, C. M., Westfall, J., & Kenny, D. A. (2017). Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power. Annu Rev Psychol, 68, 601–625. https://doi.org/10.1146/annurev-psych-122414-033702

      Lee, J., Kim, H. R., & Lee, C. (2010). Trial-to-trial variability of spike response of V1 and saccadic response time. J Neurophysiol, 104(5), 2556–2572. https://doi.org/10.1152/jn.01040.2009

      Rupprecht, P., Carta, S., Hoffmann, A., Echizen, M., Blot, A., Kwan, A. C., Dan, Y., Hofer, S. B., Kitamura, K., Helmchen, F., & Friedrich, R. W. (2021). A database and deep learning toolbox for noise-optimized, generalized spike inference from calcium imaging. Nat Neurosci, 24(9), 1324–1337. https://doi.org/10.1038/s41593-021-00895-5

      Varela, J. A., Sen, K., Gibson, J., Fost, J., Abbott, L. F., & Nelson, S. B. (1997). A Quantitative Description of Short-Term Plasticity at Excitatory Synapses in Layer 2/3 of Rat Primary Visual Cortex. The Journal of Neuroscience, 17(20), 7926–7940. https://doi.org/10.1523/JNEUROSCI.17-20-07926.1997

    1. eLife Assessment

      This important study demonstrates that ocular organoids can generate both retina and lens through a non-canonical, "inside-out" morphogenetic route. The work is supported by convincing data, with well-designed experiments combining imaging, molecular analysis, and transcriptomics to establish that lens formation in organoids follows conserved molecular programs despite an alternative morphogenesis. These findings expand our understanding of self-organization and developmental plasticity, and will be of broad interest to researchers working on eye development, organoids, and tissue engineering.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, retinal precursor cells expressing Rx3:H2B-GFP appear in the surface region of organoids. At day 1.5, Prox1+ cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eyecup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      In the revised manuscript, the authors have added new data on dissociation and re-aggregation of day one organoids and revealed that differentially adhesive property of lens and retinal precursors cells enables the formation of a spherical lens in the center of the organoid and later movement of lens toward the peripheral region of the organoid for lens evagination. Furthermore, the authors showed that BMP and FGF signaling are required for lens precursor induction and subsequent lens fiber differentiation in the organoid, respectively. In the revised manuscript, they have added new data on target tissue of BMP and FGF signaling pathways by showing phosphorylated Smad1/5/8 and phosphorylated ERK1/2, respectively, and revealed that lens precursor cells formed in the center of day one organoid are target of BMP signaling, whereas lens fiber cells formed in the center of day 1.5 to 2 organoid are targeted by FGF signaling. Finally, the authors conducted bulk RNA-seq analysis of 1-4 dpf embryonic eyes and day 1-4 eye organoids and revealed that lens organoids show a similar temporal profile of gene transcription. These data suggest that, although induction and morphogenesis of lens are differentially regulated between eye organoids and in vivo embryonic eyes, their molecular mechanism seems to be shared.

      Significance:

      Strength: This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, cellular mechanism of lens induction and morphogenesis are different between retinal organoid and in vivo eyes, although their molecular mechanism is conserved.

      Limitation: In the revised manuscript, the authors clarified almost obscure points; however, a couple of unclear points are still retained. First, there is one unknown cell-type population located in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells at day 2 organoid. Second, the authors showed that removal of HEPES from the organoid culture media inhibits lens induction and differentiation. However, the role of HEPES in lens induction and differentiation in the organoid remains to be elucidated.

      Advancement: In the revised manuscript, the authors have provided precise description of inductive and morphogenetic process of lens induction and differentiation in retinal organoid as well as their molecular evidence, which impact the research field of cell biology and regenerative medical science using human organoid.

      Audience: The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

    3. Reviewer #2 (Public review):

      Summary:

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications. The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock).

      How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      Significance:

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Comments on revised version:

      The revised manuscript is much improved and addresses all of the points raised by the reviewers.

    4. Reviewer #3 (Public review):

      Major Comments on first version:

      - The manuscript presents a beautiful set of high-quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      - The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      - The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      - The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Significance:

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

      Comments on revised version:

      The authors presented substantial additional experimental evidence that further strengthens their manuscript and addressed with these experiments and their revised results/discussion in the manuscript the comments and suggestions from the reviewers. I think the manuscript has been greatly improved with the additions presented.

    5. Author response:

      The following is the authors’ response to the original reviews

      Thank you very much for the positive and constructive feedback on our manuscript. We have revised the manuscript accordingly and have added a substantial number of additional experiments and have extended the data.

      Questions of the reviewers were focused mostly on mechanical insight into organoid formation, touching following aspects of lens organoid formation presented in the manuscript:

      - Cellular arrangements/re-arrangements during the process of lens formation including potential contribution of differential adhesion-mediated cell sorting to the cellular arrangement in the organoid and characterization of individual contributions of lens- and retina- committed progenitors to this process.

      - Activity of BMP and FGF signaling pathways during organoid formation, namely identification of tissue responding to the signaling withing forming organoids.

      - Contribution of externally supplemented Matrigel to the differentiation process and cellular arrangements in ocular organoids. 

      To address those points in detail we included additional experiments that are now presented in revised version of the manuscript, namely in revised Figure 2-figure supplement 1 (addressing contribution of Matrigel); new Figure 4-supplement 1/Video S5 (addressing contribution of differential adhesion-mediated cell sorting); revised Figure 4/Video S6/Video S7 (addressing contribution of lens-committed progenitors); revised Figure 6 (addressing BMP and FGF signaling pathway activities).

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing.

      To address this point, we included additional experiments in the revised manuscript. As proposed by the reviewer, we performed dissociation and re-aggregation experiments of day one organoids at the timepoint, when retinal cell fate is already established and first cells with early lens fate (Foxe3::GFP positive) start appearing (see new Figure 4-figure supplement 1).

      After dissociation we followed Foxe3::GFP cells over time and observed that initially equally dispersed GFP<sup>+</sup> lens-committed cells gradually sort and establish contact with other GFP<sup>+</sup> cells, ultimately resulting in the formation of a central GFP<sup>+</sup> sphere within a retinal neuroepithelium (AcTub<sup>+</sup>) localized on the surface of the organoid (see new Figure 4-figure supplement 1e and new Video S5). This data show that differential adhesive properties of lens/retinal precursor cells can enable the formation of a spherical lens in the center of the organoid. This is now clearly stated in the revised version of the manuscript. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLife). We provide evidence that the formation of cup-looking structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at specific regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the “cup-like” shape is acquired by an extrusion-like process of the lens from the center of the organoid.

      To address the cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4d, new Videos S6, S7 and S8). Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) established in the periphery display repeated short distance movements restricted to the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina. In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery. These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4d, new Videos S6, S7 and S8) in the revised version of the manuscript.

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to FGF signaling. To do so we analyzed the presence of phosphorylated ERK (pERK1/2) as FGF signaling target in ocular organoids from day 1 to day 2. At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure 6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain) (at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure 6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6d).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that clarify the fate of those cells with the required certainty. Rather than speculating, we are currently following up on that question by scRNA sequencing, however we see that beyond the scope of the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      The referee is right, figure 5e indicates the thickness of the cell sheet expressing Rx3 positioned at the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript. We have taken care to remove ambiguities related to that point in the revised version of the manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question.

      To address this point, we labeled Noggin treated organoids at day 2 and day 3 with forebrain and olfactory placode markers. We could identify an increase in the domains expressing Lhx2, HuC/D and Otx2 in Noggin-treated organoids, showing a shift of the preferential differentiation of the neurons of anterior forebrain identity (see attached figure for reviewer). However, the available markers Lhx2, HuC/D and Otx2 found in the olfactory placode are in addition also co-expressed in further neuronal cell types of the anterior forebrain. While the speculation is tempting, the shift in expression does not allow to conclusively state the expansion of the olfactory placode.

      Author response image 1.

      Expression of forebrain and olfactory placode markers.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how the optic cup-like arrangement of ocular organoids is achieved on the cellular level is indeed highly interesting. In the revised manuscript we now provide evidence that the formation of cup-like structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at distinct regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the cup-like shape is acquired by an extrusion process of the lens from the center of the organoid.

      To address cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4e, new Videos S6, S7 and S8).

      Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) display repeated short distance movements within the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina.

      In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery.

      These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4e, new Videos S6, S7 and S8) in the revised version of the manuscript.

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      We agree with the reviewer that this is a highly interesting question and in the revised manuscript we followed the advice and dedicated a part of the discussion to this topic. We believe that the arrangement is due to the induction of central lens fates by signal emanating from the retinal epithelium and discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments addressing the target tissues of FGF and BMP signaling in the organoid have been provided in response to Reviewer #1. Interfering with FGF signaling that is essential for lens fiber cell differentiation interestingly did not impact on the lens size arguing against an immediate proliferative effect. Although the analysis of the respective proliferation rates at the surface or in the central region of the organoid might show some differences, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on the acquisition/specification of Foxe3-expressing lens placode progenitors. In the absence of Foxe3-expression, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). Organoids that do not have a lens, do not contain Foxe3-expressing cells.

      In the absence of a lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup like shape (for details of such phenotypes please see Zilova et al., 2021, eLIFE). We took care to state that clearly in the revised manuscript.

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock). How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLife). The absence of the structure of the retinal epithelium indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon in the absence of Matrigel (Figure 3-figure supplement 1d-e). Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions, with and without Matrigel supplementation, Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. HEPES is mainly used to regulate the pH of the culture media which on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which goes beyond the scope of the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      We followed the reviewer’s advice and have included a systematic analysis of the contribution of ECM (Matrigel) to the process of lens formation. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium in turn indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon by the absence of Matrigel (Figure 3-figure supplement 1d-e).

      Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions (with and without Matrigel supplementation), Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed intriguing and currently under investigation. HEPES is mainly used to adjust the pH of the culture media, which, on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which clearly goes beyond the scope of the current manuscript.

      The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      We have taken care to show according stages in embryo and organoid side by side. We provide additional data to highlight the expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens and lens placode) markers in earlier developmental stages. For the presumptive eye field within the region of the anterior neural plate (S16, late gastrula) Rx3 represents one of the earliest markers (see revised Figure 3-figure supplement 1). Already before an apparent lens placode is formed (see revised Figure 3d) Foxe3::GFP expression is detected within the presumptive lens ectoderm, demonstrating that Foxe3 is ideally suited as an early marker for placodal progenitors in medaka. The onset of Rx3 and Foxe3-driven reporters is clearly early enough to support the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids now represented in the revised figures.

      The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Assessing the activity of BMP and FGF signaling (cross-reference to Reviewer #1) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to BMP and FGF signaling. To do so we analyzed the presence of phosphorylated SMAD1/5/8 (pSMAD1/5/8) and phosphorylated ERK (pERK1/2) as BMP and FGF signaling target in ocular organoids from day 1 to day 2. BMP signaling activity was detected in the center (region of establishment of lens-committed progenitors (Figure 3e)) of the organoid at day 1 (see revised Figure 6a). At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure S6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain, at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure S6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6b).

      Related to the presence of the corresponding ligands we can state that they are indeed expressed in the organoids at the matching stages based on RNA seq and RT-PCR analyses, however we could not find them specifically localized. This may be due to a widespread, ubiquitous expression or may simply relate to technical problems.

      While we can state with confidence that the ligands are present at the relevant time points and trigger the downstream pathways in a localized manner, the question whether the response is due to a localized signal or localized competence remains to be addressed.

      The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the shift of the lens in vivo is indeed very relevant suggestion and we have taken care to address this in the revised manuscript.

      To clarify this process, we included additional experiments and followed the movements of lens cells (see new Figures 4c, 4d and 4e, new Videos S6 and S7). Foxe3::GFP lens progenitor cells were found to actively move over long distances from center to the organoid periphery. This movement was accompanied by profound cell shape changes of lens progenitor cells with the active extension of lamellipodia and filopodia strongly arguing for an active movement of lens cells to the organoid periphery (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    1. eLife Assessment

      This potentially valuable manuscript focuses on the phosphorylation of residue T495 as a mechanism to inactivate HSP70 and disrupt cell cycle progression in response to DNA damage. The evidence supporting this model is solid, but would be significantly strengthened by additional studies defining the extent of T495 phosphorylation induced by DNA damage, identifying the kinase responsible for phosphorylating T495 of HSP70, and further elucidation of the functional implications of T495 phosphorylation in human cells. This work will be of interest to scientists focused on topics including chaperone biology, proteostasis, cell cycle progression, and DNA damage.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies a conserved phosphorylation event on Hsp70, at human T495 that is triggered by DNA damage. The authors show that this modification arises in response to MMS and is temporally associated with cell cycle progression through mitosis. Using biochemical analysis, they further argue that the phosphomimetic Hsc70(T495E) adopts an open-like conformation with impaired J protein-stimulated ATP hydrolysis while still retaining client binding. In yeast, both phosphomimetic and phosphonull mutants perturb growth and cell cycle progression, supporting the idea that dynamic regulation of this site helps coordinate DNA damage responses with G1/S control.

      Strengths:

      A major strength of the paper is that it links prior work on Legionella-mediated Hsp70 phosphorylation to a normal cellular DNA damage response. The study is also commendably multi-level, combining mammalian cell biology, in vitro biochemistry, and yeast genetics to support the central model. Together, the authors provide a coherent story that this Hsp70 site has functional importance in checkpoint-like control rather than being a passive phosphosite, adding to our understanding of the chaperone code.

      Minor Weaknesses:

      The authors acknowledge that the direct kinases/phosphatases for this site remain unknown. Some conclusions are therefore still somewhat inferential, especially the model that pHsp70 acts as a reversible molecular brake on S-phase entry. These limitations do not undermine the importance of these exciting findings, but they do leave the paper somewhat short of a fully resolved mechanism.

      Comments on revisions:

      The authors have done a great job in addressing all the previous reviewer concerns. They have provided additional data and refined the text, stating limitations of their proposed model. In doing so, they have produced a much-improved version of the manuscript.

    3. Reviewer #2 (Public review):

      The revised manuscript offers little new information and fails to address the critical weaknesses identified in the original submission.

      While we can agree that phosphorylation of Thr495 would likely affect Hsp70 function-given the known biochemistry of Hsp70s and the author's previous work on LegK4-the significance of this finding hinges on whether it is a regulated process. If a meaningful fraction of Hsp70 were phosphorylated in a regulated manner triggered by DNA damage or cell cycle progression, it would constitute an important discovery, regardless of its specific impact on fitness in a given context.

      However, beyond highlighting the temporal profile of Hsp70 phosphorylation in MMS-treated cells (Figure 4e), the paper fails to rule out the possibility that this correlation is merely an irrelevant side reaction. This "bystander" phosphorylation could simply be caused by the activation of kinases during the experimental MMS treatment and subsequent washout. The authors' claim-that the fraction of phosphorylated Hsp70 increases in a "regulated, cell-cycle dependent manner"-does not sufficiently counter the possibility of it being a non-functional side effect.

      This concern could be resolved if the authors had identified the specific kinase, demonstrated its specificity, and manipulated it either genetically or pharmacologically. While I acknowledge this is a "tall order," the lack of such data limits the paper's significance. Furthermore, the current data fails to meet a much lower bar: confirming that a substantial fraction of Hsp70 is actually phosphorylated under the tested conditions. Such a finding would at least suggest the event is capable of impacting the overall Hsp70 pool.

      It is surprising that the authors have not provided a ratiometric assay to settle this, such as an immunoblot of total Hsp70 separated on a Phos-tag or IEF gel. Instead, they rely on indirect evidence and data subject to alternative interpretations. Specifically, they argue that the fitness cost of the Thr495Ala mutation (or the phosphomimetic mutation) is due to the loss of regulatory phosphorylation (or deregulated phosphorylation); however, it is equally plausible that the mutations create Hsp70 hypomorphs whose defects are only exposed under stressful experimental conditions.

    4. Reviewer #3 (Public review):

      In this manuscript Moss et al. demonstrate that Hsp70 phosphorylation at a conserved threonine residue integrates DNA damage responses with cell-cycle control. The authors present unbiased biochemical, cell-based, and yeast genetic analyses showing that phosphorylation of human Hsp70 at T495 (and the analogous Ssa1 T492 in yeast) is triggered by base-excision-repair intermediates and downstream DDR kinase activity, leading to delayed G1/S progression after DNA damage. They used orthogonal approaches such as ATPase assays, phospho-specific detection, kinase-inhibition studies, synchronization experiments, and phenotypic analyses of phosphomutants. They presented robust data which collectively supported the conclusion that dynamic Hsp70 phosphorylation functions as a conserved "molecular brake" to prevent inappropriate S-phase entry under genotoxic stress.

      Comments on revisions:

      The authors have addressed all my questions and concerns.

    5. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their time and consideration of the manuscript. We have added new data to Figure 5 (Figure 5a) to address concerns regarding the conservation of the Hsp70 phosphorylation in yeast. Additionally, we have changed the title of the manuscript to “Hsp70 is phosphorylated in a conserved response to DNA damage and contributes to cell cycle control” to more accurately represent the conclusions we draw.

      Public Reviews:

      Reviewer #1 (Public review):

      The strength of evidence of the mechanistic and "conserved checkpoint" claims that this site is directly activated by DNA damage is inadequate and fundamentally incorrect.

      We respectfully disagree with the reviewer’s characterization of our conclusions. Our data demonstrate that DNA damage induces this phosphorylation in a cell-cycle–dependent manner. We do not claim to have defined the direct kinase or full mechanistic pathway; rather, we establish that site activation is damage-responsive and functionally linked to cell-cycle regulation. Consistent with this, phospho-mutants in yeast exhibit clear cell-cycle defects, supporting a conserved functional role. We address each of the reviewer’s specific concerns below.

      Specific comments:

      (1) Activation of T495:

      The author's premise for the site being activated by DNA damage is Albuquerque et al, where PTMs on MMS treated yeast are analyzed. T492 (the yeast equivalent of human T495) is observed as phosphorylated. However, the authors fail to note that there is no untreated sample analysis in this study, and it is likely that T492 phosphorylation is also present in untreated cells. This is also backed up by later evidence from the same lab (Smolka et al), where they do not identify T492 as being dependent on Mec1/Tel/Rad53 kinases.

      We agree with this assessment of the Albuquerque study. Accordingly, we used their data to generate the hypothesis that this site is phosphorylated, and we took it upon ourselves to more rigorously demonstrate phosphorylation with appropriate controls. The validated antibody that we had previously generated[1] to track pHsp70 was the enabling technology to directly track this phosphorylation event. We now directly show phosphorylation of this site (Figure 5a, lines 276-284). Of note, as Reviewer 1 suggested, there is a smaller amount of pHsp70 in the untreated cells, which corresponds with findings from Holt et al 2009 [2]. This could reflect a baseline role of Hsp70 phosphorylation for normal growth that is accentuated upon MMS insult.

      (2) The kinase(s) directly responsible for T495 phosphorylation are not identified. Instead, the authors show that knockdown or pharmacological inhibition of DNA-PKcs, ATM, Chk2, and CK1 attenuate pHsp70.

      We agree with reviewer 1 that identifying the direct kinase would be an exciting finding, and we believe our manuscript will provide the foundation for future studies to address these questions. While these findings will be impactful, we do not believe their lack detracts from the observations we have made.

      (3) ATM siRNA knockdown has no effect, while ATM inhibitors do, which the authors acknowledge but do not resolve. This discrepancy raises concerns about off-target drug effects.

      We agree with reviewer 1 that off-target drug effects are always a concern when employing pharmacological inhibitors. To that end, we tested structurally distinct inhibitors of ATM (Figure 3b) to decrease the likelihood of the same off target effect. While complementing this with a genetic knockdown would be ideal, the discrepancies between pharmacological and genetic inhibition of ATM have been well reported (lines 214-216).[3,4] Parallel discrepancies in other kinases have been mechanistically explored by other groups.[5] The preponderance of pharmacological evidence in conjunction with RNAi suggests the most likely interpretation of our data is that ATM is involved in signaling upstream of Hsp70 phosphorylation. Thus, our data compel future work to use more sophisticated genetic methods to more specifically determine how ATM connects with pHsc70.

      (4) No in vitro kinase assays, motif analysis, or phosphosite mapping confirming these kinases as direct T495 kinases are presented. Thus, the proposed signaling cascade remains speculative.

      We agree that we should carefully circumscribe our conclusions about the potential signaling cascade. To communicate our conclusions more clearly, we rewrote lines 223-226 to highlight that our findings implicate these kinases in upstream signaling rather than direct phosphorylation of Hsp70.

      (5) Smolka and many other labs characterized DDR sites as SQ/TQ motifs, and T492 doesn't fit that motif.

      We agree, and our response to comment 4 addresses this point. Briefly, we do not claim that Hsp70 is a direct target for DDR. Notably, the SQ/TQ motifs mentioned specifically pertain to ATM and DNA-PK[6], though we would like to note several studies have demonstrated DNA-PK phosphorylation outside of these motifs.[7] Chk2 and CK1 do not prefer SQ/TQ motifs[9]. Additionally, Chk2 is known to phosphorylate non-consensus sequences as well[10].

      (6) No genetic tests in yeast (e.g., BER mutants) are used to connect Ssa1 T492 phosphorylation to BER in that system, despite the strong BER-centric model.

      We agree that it would be interesting to study BER mutants in yeast, and we believe this will be an exciting prospect for future studies to better establish the signaling cascade. We have included a Western blot (Figure 5a) showing that MMS treatment causes increased Hsp70 phosphorylation in yeast. MMS damage is repaired through BER in S. cerevisiae,[11] and the pathway itself is highly conserved.[12] Our experiments demonstrate that the phosphorylation of Hsp70 occurs as a conserved response to alkylation damage, which is the major conclusion of our paper.

      (7) Overexpression of MPG gives only a modest increase in pHsp70, while APE1 overexpression has no effect, and Polβ overexpression does not decrease pHsp70. These mixed results weaken the central claim that Hsp70 phosphorylation is a tuned sensor of BER burden.

      We appreciate this incisive question. Though not immediately intuitive, we do not believe these results are necessarily ‘mixed’. The lack of APE1 over-expression having an effect could be attributed to APE1 activity being necessary for the phosphorylation, but not rate-limiting. Regarding Polβ, it is important to note that not its binding, but rather its dRP lyase activity is rate-limiting in base excision repair.[13] As such, if binding sites are already saturated or near saturated, but the lyase activity remains slow, we may not observe a decrease in BER intermediates. While we do claim that phosphorylation of Hsp70 is triggered by BER intermediates (lines 193-194), we do not claim that pHsp70 is a tuned sensor of BER burden.

      (8) A major concern is that pHsp70 is only convincingly detected after very high, prolonged MMS (10 mM, 5 h) or 0.5 mM arsenite treatments. Other DNA-damaging agents (bleomycin, camptothecin, hydroxyurea) that robustly activate DDR kinases do not induce pHsp70. This suggests to me that the authors are observing a side effect of proteotoxic stress. This is likely (see Paull et al, PMID: 34116476).

      Our data indicate that pHsp70 specifically occurs downstream of base excision repair. Therefore, it is not surprising that drugs that do not activate BER (bleomycin, camptothecin, hydroxyurea) do not elicit the same response. While pHsp70 may arise due to DSBs generated through BER, the fact we do not see phosphorylation after bleomycin treatment could be explained by the cell-cycle dependencies we report (Figure 4e). It is also important to note that MMS-induced pHsp70 occurs primarily in the nucleus, and Western blots of whole cell lysate will contain large amounts of cytosolic Hsp70 that could dilute the signal. Indeed, in our nuclear extraction (Figure 4d), we see faint pHsp70 signal as soon as 1 h after treatment, though it increases in robustness as the time-course progresses. These data are both concordant with a model in which high BER-induced lesion burden in mitosis leads to Hsp70 phosphorylation in late M/G1.

      We would like to add that, in the review article cited by Reviewer 1, the authors specifically cite studies implicating a loss-of-function in DDR pathways leading to increased proteotoxic stress (e.g. ATM deficient cells producing higher levels of aggregated proteins compared to WT). However, we find that inhibition of DDR kinases decreases, rather than increases Hsp70 phosphorylation. We thus believe that DNA damage rather than proteotoxic stress is the parsimonious cause of Hsp70 phosphorylation.

      (9) A recent study in Nature Communications (Omkar et al., 2025) demonstrates rapid phosphorylation of yeast T492 in a pkc1-dependent manner, diminishing the impact of these findings.

      We were excited to see this paper when it was published 3 months after we posted a preprint on bioRxiv, which was released three weeks after our submission to eLife. Rather than diminishing the impact of this paper, we believe that independent lines of evidence from different groups mutually reinforces the impact of the work. We have added a sentence to say that during the review of our work, this group independently observed this phosphorylation event in response to a different stress (lines 421-423). We believe in celebrating the scientific process arriving at consistent results, and the editorial policies of eLife reinforce that philosophy by offering ‘scoop protection.’

      We would also like to highlight several differences between the scope of our papers. The phosphorylation reported by Omkar et al. appears highly constrained to yeast as part of the Cell Wall Integrity pathway, whereas ours occurs as a more highly conserved response. Additionally, our paper provides additional biochemical insight into the consequences of this phosphorylation, which is lacking in Omkar et al. If anything, this paper highlights the important regulatory capacity of this residue on Hsp70, and suggests it may serve multiple functions in the cell.

      (2) Downstream Effects of T492/T495:

      (10) The manuscript's central conceptual advance is that pHsp70 is a cell-cycle-regulated brake on G1/S. Yet in mammalian cells, the authors show only that pHsp70 appears late, after cells have traversed mitosis, and that blocking CDK1 (G2/M) prevents its accumulation.

      We would like to clarify the central contribution of this study. Prior work identified this phosphorylation in yeast, but its existence and conservation in human cells had not been established. A primary advance of our study is demonstrating that this site is phosphorylated in mammalian cells and that its accumulation is cell-cycle regulated — coinciding with late M/G1.

      We further show that phosphorylation depends on cell-cycle progression, as CDK1 inhibition prevents its accumulation. While these data establish regulation, we agree that they do not by themselves define causality in mammalian cells. To address functional consequences, we leveraged the genetic tractability of S. cerevisiae. Phosphomimetic Ssa1 T492E increases the proportion of G1 cells in the absence of MMS and enforces a stronger G1 arrest following MMS treatment. Together, these findings support a conserved, cell-cycle–linked role for this phosphorylation and provide a foundation for future mechanistic work in mammalian systems.

      (11) There is no functional test in human cells: no knockdown/rescue experiments with T495A or T495E, no cell-cycle profiling upon altering Hsp70 phosphorylation state, and no demonstration that pHsp70 actually causes any delay in S-phase entry, rather than simply correlating with late damage responses. The strong conclusion that pT495 "stalls cell cycle progression" (e.g., Figure 6 model) is therefore not supported in the human system.

      We agree that we did not directly test the functional consequences of Hsp70 phosphorylation in human cells. Our intent was not to claim that we have demonstrated causality in the mammalian system, but rather to establish that this conserved phosphorylation exists in human cells and is cell-cycle regulated.

      We instead used S. cerevisiae to interrogate this due to its increased genetic tractability. In this system, phosphomimetic mutation increases the proportion of G1 cells under basal conditions and enhances G1 arrest following MMS treatment, mirroring the damage-associated phenotype observed in human cells. These findings support a conserved functional role for this modification, although we agree that direct mechanistic testing in mammalian cells will be important for future work.

      While we intended the cartoon model to be a speculative illustration of what may be occurring in order to motivate future studies. We now see how this may lead to confusion, so to improve clarity, we have removed Figure 6 from the manuscript.

      (12) All functional conclusions rely on T492A/E point mutants at the endogenous SSA1 locus, usually in an ssa2Δ background, in a family of highly redundant Hsp70s. Without showing that this site is actually modified during their MMS treatments, the assignment of phenotypes to loss of a physiological phospho-switch is premature. The authors need to repeat their studies in an Ssa1-4 background, as in https://pubmed.ncbi.nlm.nih.gov/32205407/.

      Thank you for this feedback. We have included a Western blot to Figure 5 (Figure 5a) addressing this comment. Briefly, we show that, in yeast, Hsp70 phosphorylation increases upon MMS treatment and is not detectable in the point-mutants in the ssa2∆ background. The latter data suggest that Ssa3-4 modification is negligible in our system.

      (13) The authors infer that T495E "locks" Hsc70 in a pseudo-open state based on reduced J-protein-stimulated ATPase activity, unchanged ATP binding, altered trypsin sensitivity, and retained tau binding. However, there is no direct comparison of phosphorylated vs T495E protein (e.g., via in vitro phosphorylation with LegK4 followed by side-by-side biochemical assays, or structural analysis). Thus, it remains unclear to what extent the glutamate substitution mimics a phosphate at this position.

      Previously we did show that phosphorylation impacts the ATPase cycle of Hsp70.[1] In this paper, with the phosphomimetic mutant we see an even greater decrease of activity. This is consistent with incomplete phosphorylation yielded by in vitro phosphorylation with LegK4.[1] Due to this incomplete phosphorylation in vitro, we determined that the phosphomimetic mutant would be more useful for the assays we performed, as they rely on bulk readouts.

      (14) No client release kinetics, co-chaperone binding assays, or in vivo chaperone function tests are provided, yet the discussion builds a detailed model of a "pseudo-open" state that simultaneously resembles ATP-bound conformation and allows persistent substrate engagement.

      We have shown that the conformational cycle of Hsp70 (T495E) is uncoupled from nucleotide state, and that the overall conformation resembles ATP-bound Hsp70. This is consistent with prior studies on AMPylation of the same residue.[14] Additionally, we demonstrate that substrate engagement is similar between WT and T495E. This is consistent with our previously published work showing increased pHsp70 on polysomes,[1] as well as our observations that the phosphomimetic mutant in yeast exerts a phenotype even in the presence of the compensatory isoform SSA2. This dominant-like phenotype is consistent with those seen in mutations locking Hsp70 in a ‘closed’ conformation.[15] We agree that future studies examining client release kinetics and co-chaperone binding would be useful for future structural studies validating and elaborating on our findings.

      Reviewer #2 (Public review):

      Weaknesses:

      The kinase(s) responsible for the phosphorylation have not been identified (and hence remain inaccessible to experimental i.e., genetic or pharmacological manipulation). The mechanistic links to DNA damage repair and the fitness benefits of this proposed adaptation remain obscure. Of greater concern, the data provided in the paper fail to exclude the trivial possibility that the phosphorylation event described (and characterized through biochemical proxies) is biologically neutral, reflecting nothing more than a bystander event in which kinase(s) activated by application of high concentrations of a powerful alkylating agent (MMS) phosphorylate, at meaninglessly low stoichiometry, an abundant protein (Hsp70) on a surface exposed residue. Failure to exclude this (plausible) scenario is this paper's weakness.

      We agree that we have not directly quantified the absolute stoichiometry of Hsp70 phosphorylation. However, several lines of evidence argue against the interpretation that this represents a biologically neutral, bystander modification.

      First, our pulse-chase experiment (Figure 4e) shows that, after MMS removal, pHsp70 levels increase as cells progress through the cell cycle. Notably, total Hsp70 levels remain constant. This indicates that the fraction of phosphorylated Hsp70 increases in a regulated, cell-cycle dependent manner, rather than through a bystander event during acute stress.

      Second, functional perturbation of the homologous site in yeast produces phenotypic consequences. The phosphomimetic Ssa1(T492E) mutant exhibits reduced growth, increased G1 accumulation, and impaired cell-cycle re-entry following MMS treatment (Figure 5). These phenotypes argue that the modification of this residue is functionally consequential.

      While the upstream kinase remains to be identified, the genetic and cell-cycle phenotypes observed upon site perturbation argue that this phosphorylation is functionally consequential.

      Reviewer #2 (Recommendations for the authors):

      (1) The biochemical characterization of the phosphomimetic mutation (T495E) is thorough, relying on ATPase assays and conformational analysis. Figure 1b demonstrates reduced J-protein-stimulated ATPase activity, and Figure 1d shows an ATP-like proteolysis pattern consistent with an open conformation. As the authors are well aware, Hsp70 chaperones act on their substrates via a dynamic cycle that includes binding, ATP hydrolysis, and conformational shifts. One wonders, therefore, at the relevance of the measurement shown in Figure 1f. While it is highly plausible that the T495E mutation mimics the phosphorylation event (BiP T518E mimics key aspects of AMPylation), the lack of a biochemical characterisation of Hsp70 with pThr495 is an important limitation of this paper. Even if such a preparation cannot be accomplished with the endogenous kinase(s) whose identity remains unknown, a characterisation of LegK4-phosphorylated Hsp70 should suffice.

      We agree with Reviewer 2 that the rationale for figure 1f does not logically follow the results of 1b and 1d. Rather, this experiment was motivated by the prior findings that phosphorylation of Hsp70 by L.p. lead to an increase occupancy on polysomes[1] (lines 137-139). We sought to better understand the discrepancy between this finding and our own by assaying the capacity of the T495E mutant to bind substrate.

      Reviewer 2 raises a valid point in that phosphomimetic proteins do not necessarily behave the same as truly phosphorylated proteins. Previous work from our lab characterized the ATPase activity and in vitro folding capacity of Hsc70 that had been directly phosphorylated by LegK4[1] (lines 114-115). We were motivated to turn to a phosphomimetic mutant as LegK4 only phosphorylates around half of the Hsc70 present in solution[1] (line 116); this mixture of species makes batch analysis difficult. As we had previously published with the in vitro phosphorylated Hsc70, we didn’t believe it necessary to include along with our future analyses.

      (2) As noted, the kinase(s) that phosphorylate T495 remain to be identified and is inaccessible genetically. The phenotypic consequences of impaired pThr495 are therefore assessed by a T495A mutation. This most certainly eliminates phosphorylation at that site however, Figure 5C shows quite clearly that the T/A mutation is not neutral. This is expected, given the role of an H-bond network centered upon the homologous residue in the ADP-bound configuration of Hsp70's. Importantly, the biochemical non-neutrality of the T/A mutation also compromises the interpretation of the associated phenotype, as this cannot be attributed solely to a loss of phosphorylation; it may reflect features of the T/A mutations exposed by MMS, but unrelated to the inability of the residue to undergo regulated phosphorylation.

      We appreciate this insightful critique. We agree that the alanine substitution may perturb the local H-bond network, and have added a sentence to our discussion to highlight this caveat (lines 379-381). That being said, our conclusions do not solely rely on the T to A mutant. The phenotypes observed in our phosphomimetic mutant overlap with the TA mutant (increased sensitivity to MMS; defects in cell cycle re-entry after MMS treatment) (Figure 5). While the alanine mutation may not represent a purely ‘loss-of-phosphorylation’ state, our findings do implicate the importance of this residue in cell cycle control after DNA damage.

      (3) It thus remains formally possible that pThr495 arises as an irrelevant side reaction due to activation of a kinase (with other relevant substrates).

      This dismal interpretation of the data would be dispelled somewhat if the stoichiometry of pThr495 were substantial, whereas very low stoichiometry of phosphorylation should leave one wary of the possibility that the surface-exposed Thr495 of ATP-bound Hsc70 is a physiologically irrelevant bystander target of a kinase activated in DNA-damaged cells.

      We have included a Western blot in Figure 5 showing pHsp70 in our yeast samples. Here we can see low abundance of Hsp70 phosphorylation in untreated WT yeast, with a clear increase in MMS treated yeast. Additionally, as mentioned in a previous response, Figure 4e shows the accumulation of pHsp70 in human cells even after MMS removal, indicating it is not simply the byproduct of over-activation of the DNA damage response.

      Unfortunately, the study does not quantify the stoichiometry of Hsp70 phosphorylation; detection relies on phospho-specific immunoblotting, leaving open the question of whether this modification occurs at physiologically significant levels. This worry is compounded by Figure 2a,f that suggests that phosphorylation occurs only under high-dose MMS or arsenite, raising concerns about physiological relevance.

      We agree that we did not quantify absolute phosphorylation stoichiometry. While a precise measurement would be informative, our conclusions are based on regulated dynamics and functional perturbations rather than magnitude alone. Specifically, our pulse-chase (Figure 4e) shows that total Hsp70 levels remain constant while pHsp70 increases in a cell-cycle dependent manner following MMS removal. This indicates a regulated modification rather than a side-effect of kinase over-activation during acute stress. Additionally, perturbation of the homologous site produces cell-cycle phenotypes (Figure 5) in yeast, supporting functional relevance.

      However, as mentioned in responses to Comment 3, our pulse-chase assay in Figure 4e indicates the stoichiometry of pHsp70 increases after MMS removal in a cell-cycle dependent manner. Furthermore, as discussed in response to Reviewer 1 Comment 8, Figure 4d highlights a technical limitation with regards to detection of pHsp70 by Western blotting. Namely, as pHsp70 accumulates in the nucleus, signal appears to be diluted by unmodified Hsp70 in the cytosol when whole-cell lysate is probed, thereby reducing detection capacity. It is therefore possible that less stringent doses do lead to phosphorylation, but due to the experiments being run in asynchronous cells and on whole cell lysate we failed to detect it.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Figure 1e - Which antibody was used to probe this blot?

      Thank you for catching this omission. This was stained with Coomassie. We have edited the figure legend to reflect this.

      (2) Figure 1c- Do the authors have the data of the WT and T495E with DJA2?

      The assay was performed with increasing concentrations of DJA2 for both constructs (from 0 µM to 4 µM) (lines 118-119, Figure 1c).

      (3) Figure 2- The labeling of the right side of the immunoblots is missing.

      We apologize for the confusion. The labeling is on the left. The lines on the right are intended to demarcate blots that came from the same membrane (for easier comparison of loading controls).

      (4) Figure 2d- Does MMS treatment lead to a heat shock response?

      We have not directly tested this. However, we do not see the massive upregulation of HSPs that would be expected from a heat shock response.

      (5) Figure 4c and e - Total protein level of some of the phospho-proteins is missing.

      We used housekeeping proteins as loading control. We do not have antibodies for all the non-phospho proteins. For those we have, blots not included in the publication do not show any marked discrepancies between the non-phospho form and the housekeeping proteins.

      (6) Figure S1A- Although the authors suggest that the phosphorylation event is reversible, they have not integrated it into the final model in Figure 6.

      In line 403 we postulate that dephosphorylation may permit client release. In the interest of clarity, we have now removed the model figure.

      (7) Yeast genotype is missing.

      We used W303a yeast (line 612).

      (8) It is unclear which phosphatase inhibitor was used in their assay (Figure S1A).

      We repeated the experiment with both Halt Phosphatase Inhibitor Cocktail (Thermo Scientific 78440) and Roche PhosStop (Roche 04906837001) (lines 524-525).

      (9) Please add this most recent and up-to-date reference (PMID: 40976416) related to your study.

      We have now added that reference

      (10) Can the authors speculate on whether Hsp70- T495E is expected to primarily reside in the nucleus?

      We have no data to indicate whether or not phosphorylation at T495 or a phosphomimetic mutation in this site would directly affect nuclear import or export. In cells expressing the Legionella kinase LegK4, pHsp70 exists in the cytoplasm,[1] indicating the phosphorylation in of itself does not force nuclear localization. We thus imagine that the nuclear localization seen in Figure 4d is more likely due to the location of the kinase rather than as a consequence of the phosphorylation. In an over-expression system or in the case of a genomic mutation, we believe the protein is most likely to exist in both the cytoplasm and in the nucleus, though we did not directly test this.

      References

      (1) Moss, S. M. et al. A Legionella pneumophila Kinase Phosphorylates the Hsp70 Chaperone Family to Inhibit Eukaryotic Protein Synthesis. Cell Host Microbe 25, 454-462.e6 (2019).

      (2) Holt, L. J. et al. Global Analysis of Cdk1 Substrate Phosphorylation Sites Provides Insights into Evolution. Science 325, 1682–1686 (2009).

      (3) Choi, S., Gamper, A. M., White, J. S. & Bakkenist, C. J. Inhibition of ATM kinase activity does not phenocopy ATM protein disruption. Cell Cycle 9, 4052–4057 (2010).

      (4) Menolfi, D. & Zha, S. ATM, ATR and DNA-PKcs kinases—the lessons from the mouse models: inhibition ≠ deletion. Cell Biosci. 10, 8 (2020).

      (5) Weiss, W. A., Taylor, S. S. & Shokat, K. M. Recognizing and exploiting differences between RNAi and small-molecule inhibitors. Nat. Chem. Biol. 3, 739–744 (2007).

      (6) Kim, S.-T., Lim, D.-S., Canman, C. E. & Kastan, M. B. Substrate Specificities and Identification of Putative Substrates of ATM Kinase Family Members*. J. Biol. Chem. 274, 37538–37543 (1999).

      (7) Jette, N. & Lees-Miller, S. P. The DNA-dependent protein kinase: A multifunctional protein kinase with roles in DNA double strand break repair and mitosis. Prog. Biophys. Mol. Biol. 117, 194–205 (2015).

      (8) O’Neill, T. et al. Determination of Substrate Motifs for Human Chk1 and hCds1/Chk2 by the Oriented Peptide Library Approach*. J. Biol. Chem. 277, 16102–16115 (2002).

      (9) Fulcher, L. J. & Sapkota, G. P. Functions and regulation of the serine/threonine protein kinase CK1 family: moving beyond promiscuity. Biochem. J. 477, 4603–4621 (2020).

      (10) Craig, A. et al. Allosteric effects mediate CHK2 phosphorylation of the p53 transactivation domain. EMBO Rep. 4, 787–792 (2003).

      (11) Xiao, W., Chow, B. L. & Rathgeber, L. The repair of DNA methylation damage in Saccharomyces cerevisiae. Curr. Genet. 30, 461–468 (1996).

      (12) Memisoglu, A. & Samson, L. Base excision repair in yeast and mammals. Mutat. Res.Fundam. Mol. Mech. Mutagen. 451, 39–51 (2000).

      (13) Srivastava, D. K. et al. Mammalian Abasic Site Base Excision Repair IDENTIFICATION OF THE REACTION SEQUENCE AND RATE-DETERMINING STEPS*. J. Biol. Chem. 273, 21203–21209 (1998).

      (14) Preissler, S., Rato, C., Perera, L. A., Saudek, V. & Ron, D. FICD acts bifunctionally to AMPylate and de-AMPylate the endoplasmic reticulum chaperone BiP. Nat. Struct. Mol. Biol. 24, 23–29 (2017).

      (15) Fontaine, S. N. et al. Isoform-selective Genetic Inhibition of Constitutive Cytosolic Hsp70 Activity Promotes Client Tau Degradation Using an Altered Co-chaperone Complement*. J. Biol. Chem. 290, 13115–13127 (2015).

    1. eLife Assessment

      Zandvoort and colleagues have used an innovative approach to study respiration-brain coupling in the context of apnoea in human newborns. This fundamental question is supported with convincing data and analyses. Having addressed all the reviewer comments, there was a general consensus that this work will be of great interest, not only to neonatal clinicians and physiologists, but also broadly to anyone interested in brain-body interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Comments on revisions:

      I would like to thank the authors for a careful revision and additional clarifications; I have no further questions.

    3. Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling.

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      There are no major weaknesses to the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

      Comments on revisions:

      All the weaknesses are adequately addressed. No more comments

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the extent to which phase-amplitude coupling (PAC) of respiratory and electrophysiological brain activity recordings was related to episodes of life-threatening apnoea in human newborns.

      Strengths:

      I want to commend the authors for acquiring unique and illuminating data; the difficulty in recording and handling these data has to be appreciated. As far as I can tell, Zandvoort and colleagues are the first to provide robust evidence for respiration-brain coupling in newborns. Their creative use of the phase-slope index for peripheral-central interactions is innovative and credible. If proven to be robust, the authors' findings have important implications well beyond the field of brain-body research.

      Weaknesses:

      While the analyses were overall competently conducted and well-justified, I was not entirely convinced by a few methodological choices, specifically i) the computation of PAC surrogates, ii) details of the linear mixed-effects model, and iii) the electrode selection for linking phase-amplitude coupling to apnoea frequency.

      Thank you for your kind comments and helpful review of our paper. We have now clarified computation of PAC surrogates, added further details of the linear-mixed effects models and calculated the link between the strength of the cortico-respiratory coupling (phase-amplitude coupling) and apnoea rate with data acquired at all electrodes. We provide further details for each of these in response to your ‘Recommendations for the authors’.

      Reviewer #2 (Public review):

      Summary:

      The author's central hypothesis was that the strength of cortico-respiratory coupling in infants is negatively associated with apnoea rate. To prove this, they first investigated the existence of cortico-respiratory coupling in premature and term-born infants, the spatial localisation of the cortical activity and its relationship with the phase of the respiratory cycle, and the directionality of coupling. 

      Strengths:

      The researchers used synchronised EEG and impedance pneumography to detect the phase amplitude coupling.

      They have studied a wide range of gestations, from 28 weeks to 42 weeks, including males and females. Their exclusion criteria ensured that healthy babies were studied and potential confounders of impaired respiratory activity were avoided. Their sequential approach in addressing the objectives was appropriate.

      Weaknesses:

      As a neonatal clinician and neuroscientist, I have commented based on my expertise. I have not commented on signal processing.

      I did not identify any major weaknesses in the study. Some minor weaknesses include:

      (1) Data relating to the cortical oscillations and the respiratory phase is given. However, whether this would lead to their hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is unclear. What preceding data enabled the authors to link the strength of coupling to the rate of apnoea?

      (2) If we did not know of data showing the existence of cortico-respiratory coupling in newborn infants, then should it not be the first research question to examine?

      (3) What are the characteristics of the infants who contributed data to establish the cortico-respiratory coupling (Figures 2 and 3)?

      (4) Although it is the most plausible direction of the relationship, with neural activation driving respiratory muscle contraction, how can the authors prove this with their data? Given that they show coherence between signals, how do we know that the cortical signal precedes the respiratory muscle contraction?

      (5) Apgar score is an ordinal variable. The authors should summarise this as median (range).

      Thank you for your useful comments. We have revised the manuscript to address these comments and improve the clarity.

      (1) We agree that proceeding data leading to the hypothesis that the strength of cortico-respiratory coupling is negatively associated with apnoea rate is limited. We have clarified in the introduction that adult studies have previously suggested that cortical motor activity may prevent hypoventilation and apnoea seen in patient groups. We have also added further clarification to our hypothesis. In the introduction we now state:

      “In adults with congenital central hypoventilation syndrome or obstructive sleep apnoea, a respiratory-linked increase in cortical motor activity suggests that the motor cortex plays an important role in maintaining autonomous respiration, with the authors postulating that cortico-respiratory drive whilst participants are awake may prevent the hypoventilation/apnoea observed in these patients whilst they are asleep.”

      And later:

      “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (in line with the suggestions made from studies of adults with congenital central hypoventilation syndrome[6] and obstructive sleep apnoea[7]).”

      (2) We agree that this was the first research question we examined. We have clarified this in the introduction, now re-writing the hypothesis and aims to state “We hypothesised that cortico-respiratory coupling occurs in newborns and that the strength of cortico-respiratory coupling is negatively associated with apnoea rate (…). To this end, we first examined whether cortico-respiratory coupling exists in both premature and term infants.”

      (3) Figures 2 and 3 used the full dataset. We have clarified this in the Figure captions by stating: “For all panels, data included is from 68 infants (28-42 weeks postmenstrual age [PMA] at time of recording) on 104 recording occasions. See Table 1 for further clinical and demographic characteristics.”

      (4) We used a cross-frequency version of the phase-slope index to quantify the directionality and strength of information flow between cortical and breathing time series (Figure 3C,D). The phase-slope index investigates phase lags and how these change over narrow frequency ranges by examining the slope of the phase spectrum of their complex coherency. This indicates whether one signal leads or trails another signal (and thus indicating directionality). However, we agree (and as was also noted by Reviewer 3) that this analysis does not ‘prove’ directionality as other factors may influence the analysis. We have added the following to the text to address this point:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (5) We have revised Table 1 so that Apgar scores are provided as median and interquartile range.

      Reviewer #3 (Public review):

      Summary:

      This is a strong and important report that presents a framework for understanding cortical contributions to neonatal respiration. Overall, the authors successfully achieved their goal of linking cortical activity to respiratory drive. Despite the correlational nature of this study, it is a crucial step in establishing a foundation for future work to elucidate the interaction between cortical activity and breathing.

      Strengths:

      (1) The introduction and use of workflows that establish correlational relationships between breathing and brain activity.

      (2) The execution of these workflows in human neonates.

      Weaknesses:

      Interpretations related to causal inference, confounds of sleep and caffeine, and the spatial interpretation of EEG data need to be addressed to ensure that the data appropriately support the conclusions.

      Thank you for your useful comments. We have now substantially revised the manuscript in relation to causal inference and our interpretations of the data, in particular adding further detail to the discussion with regards to the limitations of our approach and revising wording that has causal implications. We provide more detail in response to your ‘Recommendations for the authors’.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I want to elaborate on the three points of methodological criticism, and my apologies in case I have some misconceptions:

      (1) It seems like the surrogate distribution to determine PAC significance was computed by shuffling EEG segments and recomputing PAC each time. Surrogate computations are a difficult topic when handling signals as regular as respiration time series. However, random shuffling of data segments is almost always an overly liberal approach (except for trial-based data) since it destroys the temporal autocorrelation of the underlying signal. As the resting-state data in the present study were per sé continuous (and just segmented for analytical purposes), I am not convinced that random shuffling provides an adequate control. Could the authors either a) provide convincing evidence that the temporal autocorrelation of verum and surrogate time series did not differ from one another, or b) conduct additional control analyses based on an alternative approach, e.g., by constructing surrogate respiration phase vectors and recomputing PAC accordingly? We have had good experiences with the IAAFT approach (outlined in Kluger et al., Nat Comms 2023), but others certainly exist.

      Thank you for this important comment on the construction of surrogates. We agree that it is essential for any surrogate approach that it destroys the cross-signal coupling whilst preserving the signals’ internal structure (e.g., autocorrelation, spectral profile, and non-stationarities) as much as possible. We apologies for not describing this clearer in the initial manuscript, but we want to clarify that in the surrogate analysis, we did not shuffle time points/segments within EEG trials itself. Instead, we permuted the trial order so that respiration trial T1 was paired with an EEG trial other than T1. This leaves the 4-sec segments used in the PAC analysis unaltered. This surrogate technique preserves the important internal properties of each signal (within-trial autocorrelation, auto-spectra and power distribution of the signals) while destroying the cross-signal alignment across trials, and thus the trial-wise phase locking (e.g., coherence) between respiration and EEG. We have clarified this in the manuscript as follows:

      “The surrogate analysis was performed by randomly permuting the trial (4-s segment) order of the EEG amplitude while leaving the respiration trial order unchanged (i.e., respiration segment S1 was paired with an EEG segment Sj, j ≠ 1). Importantly, no temporal samples were shuffled within segments. Thus, the full within-segment temporal structure, including autocorrelation and spectral profile (auto-spectra), was preserved for both signals. This permutation destroys trial-wise cross-signal phase alignment (and therefore coherence) while retaining the intrinsic dynamics of each signal.”

      (2) The LMEM approach is very sound, but it seems like ID was the only random effect included in the model. Could the authors clarify whether multiple sessions from individual neonates were considered or whether each ID was only represented once? In case of the former, one possibility would be to include 'session' as an additional random effect; otherwise, the group statistic could be biased. Many thanks in advance for providing insight on this.

      Thank you for this important point. Of the 68 infants included in the study, 49 only had a single session. The remaining 19 infants had between 2 – 5 sessions included. Given that most infants only had a single session it is not possible to identify random effects of session reliably and so we have not included session as a random factor. Moreover, postmenstrual age [PMA] (which is related to session order within a subject and is likely a more reliable indicator of variance given that sessions were not at fixed intervals) is already included as a factor in the analysis. Indeed, session ID is not a distinct source of clustering and will be indistinguishable from subject and PMA variance.

      In relation to this question, we carefully checked the analysis and realised that we had included infant with a random effect of both slope and intercept. Given that most infants have only one session the random effect of slope cannot be estimated and so we have now removed this from the analysis leading to very minor changes in the results (and no changes in the interpretation). We have clarified in the manuscript that “Infant ID was included as a random effect acting on the intercept.”

      (3) It is not entirely clear to me why the authors selected the two electrodes with the strongest overall PAC for the analysis of apnoea frequency. Why not consider all electrodes individually? What is the worry/hypothesis regarding electrodes with low PAC - would one not expect simply to find no relationship with apnoea frequency, and would that information not be instructive? Again, I want to thank the authors in advance for their take on this comment.

      We initially included only the two electrodes with the strongest coupling as we would not expect a relationship with apnoea rate at those electrodes without significant coupling (as you say). For completeness, we have now included the relationships with all electrodes individually in Supplementary Figure S4. As expected, the relationship between apnoea rate and coupling (coherence) was not significant for the electrodes without strong coupling.

      Reviewer #3 (Recommendations for the authors):

      Major Comments:

      (1) Causal Language and Overinterpretation are evident throughout the manuscript. The manuscript repeatedly uses language suggesting causality (e.g., "cortical motor activity reduces apnoea"), despite the correlational nature of the findings.

      It is recommended that the authors reframe their claims in the abstract and discussion to clarify that the observed associations do not establish causal influence. For example, Abstract: "...revealing novel mechanistic insight....". This correlational observation does not reveal a mechanism but rather supports the concept of mechanistic interactions.

      Thank you for this important point. We have now rephrased the manuscript throughout, particularly in the abstract and results/discussion. We have also added the following sentences to the discussion to address the point on causation:

      “Nevertheless, it is important to recognise that a limitation of this analysis is that correlation does not imply causation, and future mechanistic studies are required to determine whether and how cortico-respiratory coupling plays a role in reducing apnoea in infants.”

      And later:

      “The limitations of our study need to be considered, and in particular, directionality of the cortico-respiratory coupling, improved spatial localisation, and a direct mechanistic link between cortico-respiratory coupling and apnoea rate, should be investigated in future work.”

      (2) Potential Confounding by Sleep State and Caffeine. Sleep state is a significant determinant of apnoea occurrence and EEG frequency composition, yet no objective sleep-state classification is incorporated. Similarly, caffeine, administered in ~50% of recordings, is a potent respiratory stimulant. A reanalysis of the data, incorporating sleep proxies (e.g., EEG spectral ratios, delta/theta dominance) and caffeine exposure as covariates or stratification factors in the PAC-apnoea models, should be performed.

      Sleep state: A limitation of our work is that we did not record sleep state and unfortunately, we do not have anyone trained to annotate sleep states from EEG recordings in our research group. We have added the following to the discussion to address this:

      “It is known that most apnoeas in infants occur during active sleep[6][30] and delta- and theta-band frequencies in EEG are strongly related to sleep state[31]. A limitation of our study is that we did not record the sleep state of the infant.”

      Caffeine: We agree that caffeine is a respiratory stimulant and, hence, it is important to consider this effect. Moreover, those infants prescribed caffeine are those who are at greatest risk of apnoea and so it is of interest to determine whether the relationship between PAC and apnoea rate occurs in those infants receiving caffeine treatment. We conducted a stratified analysis to address this point, now providing an additional Supplementary Figure.

      (3) Directionality Inference from Phase-Slope Index. While PSI suggests a lead-lag relationship, it does not confirm causality and may be influenced by signal-to-noise or preprocessing steps. Validation PSI findings using additional metrics (e.g., time-lagged cross-correlation or Granger causality) or, at a minimum, temper interpretations of cortical "driving" respiration.

      We agree that the PSI (and other metrics such as Granger causality) may be influenced by a range of factors. We have therefore changed the wording throughout and also added the following:

      “However, caution is needed in the interpretation of these results as signal processing techniques such as the phase-slope index estimate directionality but do not confirm causality. Rather, they show a statistical relationship which can be influenced by a multitude of factors (e.g., signal-to-noise ratio and preprocessing steps). Nevertheless, the results suggest that cortical activity may precede respiration in newborns. Future work is needed to confirm this association by, for example, employing other metrics to estimate directionality, such as the time-lagged cross-correlation and Granger causality and through direct mechanistic studies.”

      (4) Limited EEG Spatial Resolution. The attribution of CRC to "cortical motor areas" is overstated, given the use of only 8 EEG electrodes, which provides limited spatial coverage. Avoid overly precise interpretations regarding cortical localization unless source localization or higher-density EEG data are available.

      We have added the following to specifically address this limitation.

      “It is important to note that the number of electrodes in our montage is limited (with only 8 recording electrodes), and so source localisation was not possible; higher-density recordings are warranted to confirm whether the motor cortex plays a role.”

      We have also changed the wording in the summary paragraph and abstract to add this limitation and reworded throughout the manuscript to highlight the limitations of our study.

      Minor Comments

      (1) Consider color-coding individual points in Figure 4A by PMA or caffeine status to visually disambiguate potential age-related or pharmacological effects.

      We agree that this provides additional visual information and have colour-coded the points in Supplementary Figure S6 according to caffeine status.

      (2) Clearly define PAC versus CRC. These are used interchangeably. Readers may benefit from a more consistent and precise usage, especially in the abstract.

      Thank you for noticing this. We have revised the terms where necessary throughout, and the abstract and introduction to read:

      “Using simultaneous electroencephalography (EEG) and impedance pneumography we investigated interactions between cortical and respiratory activity (known as cortico-respiratory coupling) using phase-amplitude coupling.”

      “Recently, it was proposed that communication between the cortex and lungs, known as cortico-respiratory coupling, can be identified and quantified through phase-amplitude coupling. This functional coupling arises when the amplitude of cortical activity is modulated by the respiration phase, or vice versa. Phase-amplitude coupling is typically quantified using non-invasive recordings capturing respiratory and neural activity (e.g., magneto- or electroencephalography [EEG]).”

      (3) Clarify the overlap with previously published datasets (line 358). Are any EEG-apnoea associations re-analyses of data published in Zandvoort et al., 2024?

      We have amended this sentence to explain that the previous study did not investigate respiration/apnoea. We now state:

      “Parts of this dataset have been reported earlier in Zandvoort et al. [33] to address a different research question (this study investigated the development of sensory-evoked potentials, which were also recorded in these infants; it did not explore respiration).”

    1. eLife Assessment

      This important study shows how stochastic and deterministic factors are integrated in Dictyostelium discoideum to reliably drive determination of distinct cell types despite exposure to nearly identical environmental conditions. The authors present convincing evidence that gene expression variability contributes to the robustness of cell fate decisions, which reveals an unexpected role of stochasticity during cell differentiation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using *Dictyostelium discoideum* as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability due to asynchrony in cell cycle phase across cells in the populations and stochasticity of biochemical processes enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors characterize the response of cell fate decisions as dependent on a combination of deterministic (cell cycle phase) and stochastic factors (variability in gene expression). They then identify Set1 - a key regulator of gene expression variability - and indicate the mechanism of histone methylation, through which it modulates the variability. Finally, they confirm that gene expression variability contributes to the robustness of cells' response (at the population level) by comparing and contrasting the predictions from the mathematical model versus the outcomes in wild type and set1- mutants.

      Strengths:

      The authors are careful in their choice of experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression. The mathematical model chosen is simple to follow intuitively and yet predictive enough (at a qualitative level) of the effects of stochastic-deterministic combination of factors, and burst size/frequency.

      Weaknesses:

      While the authors show that gene expression variation is a feature of genes associated with fate choice and cell type proportioning, it remains somewhat unclear if this kind of variation, or any amount of it, is always beneficial for robustness or there is some optimum level of it.

    3. Reviewer #2 (Public review):

      Summary:

      A fundamental problem in developmental biology is how a group of apparently identical cells breaks symmetry and differentiates into, for instance, type A and type B cells in the absence of any external influence such as a gradient of something causing cells at the left side of the group to become type A cells. The authors use the model system Dictyostelium to explore the interplay between a known cell-cycle-dependent musical chairs mechanism (cells are at random phases of the cell cycle, and a signal that hits all the cells causes cells that happen to be in one set of cell cycle phases to become the A cells, and cells that happen to be in other phases become the B cells), and stochastic gene expression. They identified genes whose expression is stochastic (unusually high cell-cell variation). Using a very clever and elegant genetic screen, they then show that these genes often are associated with cell fate choice. The authors then show that the stochastic genes have reduced levels of histone (H3K4) Me3 methylation, and that a histone methylase called Set1 is important for this process. They then bring the work together to show that the cell-cycle-dependent mechanism and stochastic gene expression work in combination to generate the observed differentiation of Dictyostelium cells.

      Strengths:

      Combination of theory, clever genetic screens, single-cell RNA-seq, and molecular and cell biology to dive into the fundamental problem of cell fate choice.

      Results support the conclusions.

      Very significant contribution to developmental biology.

      Weaknesses:

      Because the paper is co-written by people doing theoretical work and people doing experimental work, the theory sections will be difficult for an experimentalist and vice versa, but it is very much worth the effort to read this paper, there is a lot in here. There are no weaknesses of the methods and results.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Joint Public Review:

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using Dictyostelium discoideum as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability (a stochastic factor) enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors demonstrate that cell fate decisions in D. discoideum depend on a combination of deterministic and stochastic factors, i.e., cell cycle phase and gene expression variability, respectively. They then identify Set1 - a key regulator of gene expression variability - indicate the mechanism through which it modulates this variability, and link it to a phenotype in D. discoideum development. Finally, they confirm that gene expression variability contributes to the robustness of the cell's response to environmental disruptions that interfere with the cell cycle.

      Strengths:

      The authors are careful in the choice of their experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression.

      Weaknesses:

      However, in terms of mathematical modelling, it would be important to rule out sources of stochasticity (other than gene expression variability), and also to consider cases where stochastic factors are not necessarily completely independent of the deterministic ones.

      We thank you and the reviewers for the insightful comments that have helped clarify the findings presented. We have addressed all comments and feel that the revised manuscript is much improved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Minor typographical mistakes:

      (a) in the title: Linage -> lineage

      Corrected as suggested

      (b) on page 19: use a full stop in "...are biased towards the stalk fate, Use of the cell cycle position..."

      Corrected as suggested

      (c) on page 20: become -> becoming in "...(and end up biased towards become stalk)..."

      Corrected as suggested

      (d) on page 16: "mu = G p k". Perhaps it should be x instead of k?

      Corrected as suggested

      (2) Regarding the abstract:

      (a) This work tries to outline general principles (coordination/integration of deterministic and stochastic factors) in cell fate choice, especially when cells are faced with (near) identical environmental conditions. Perhaps the abstract, especially the first line, could be rephrased to reflect the generality of symmetry breaking and differentiation that is studied in this article/work. e.g., as was done in the first paragraph of the discussion.

      Corrected as suggested

      (b) It might be worthwhile clarifying what "this" is in the sentence "We suggest this represents an adaptive mechanism that increases developmental robustness against perturbations that affect deterministic signals." in the abstract.

      Corrected as suggested

      (3) Regarding the model:

      (a) The model tries to combine the stochastic and deterministic parts to explain the propensity for stalk fates. It is assumed that the cell cycle-associated factors (CCAF) provide the deterministic part while the cell cycle-independent factors (CCIF) provide the stochastic part. The net result is an addition of the two, which is then compared against a threshold to decide the propensity for stalk fates. However, another simple way to introduce stochasticity would be to make the CCAF decay stochastic. Reasons to consider this scenario would be: (i) the decay process (especially in the biological context) is generally stochastic, (ii) it would not be inconsistent with the fact that cell cycle dependent genes are also variable, and (iii) this way of introducing stochasticity would also provide expression level characteristics/plots similar to the ones outlined in Figure 1C, i.e. with a probability distribution of CCAF values for a given amount of time after mitosis. Would there be arguments or experimental evidence to rule this possibility out? For instance, would the results shown in Figure 7 contradict this model?

      We agree that there could be stochasticity the CCAF decay process. In this scenario, the expected value of CCAF (which would reflect the mean of a noisy distribution) would show a deterministic pattern of decay through time, representing the average value of CCAF across cells that are in the same phase of the cell-cycle. The noisiness around such a pattern of deterministic decay in the mean value of CCAF (i.e., the residual variation) would then represent CCIF since it would be, by definition, cell-cycle independent. Hence, the present model is fully consistent with this possibility since it would still lead to some variation being cell-cycle associated and some variation being cell-cycle independent. Therefore, this scenario could be viewed as a different functional/biological process leading to the same ultimate distribution we model. To clarify this, we have added text justifying the hypothesis that the noisy distribution is due to gene expression differences, rather than decay itself:

      “Protein levels can vary widely between cells because it is regulated at multiple levels, including transcription, translation and stability. The position of the noisiest step in a pathway affects the overall noise dramatically, because each step usually amplifies noise in the previous steps (Alon 2007). Consistent with this idea, theory and single-cell experiments have shown that a major contributor to cell-cell variation is the bursty expression of low-copy mRNAs. We therefore hypothesized that this noisiness across cells arises from stochastic expression of a set of genes contributing to CCIF levels.”

      (b) On page 7, the formula for total CCIF variance assumes independence of the genes g_i. Is this a reasonable assumption?

      This concerns the argument that a set of stochastically expressed genes will yield an approximately Gaussian distribution of CCIF. Our results do not depend on the solution for the mean and the variance, only that noisy genes will generally yield such a Gaussian distribution.This is because independence is not strictly required for the central limit theorem to yield a Gaussian distribution. The distribution will still be Gaussian under a broad range of conditions (especially since gene expression is bounded, so there is no chance of the total ending up generating an infinite variance). The primary requirement is that the expression of any given gene is independent from that of most other genes. As a result, most of the variation in expression across genes is independent (even if any given gene is not independent from all other genes).

      The most likely pattern of non-independence will be the case in which gene expression is ‘modular’, where there are co-expressed blocks, meaning that non-independence is limited in scale so that genes within a co-regulated block show correlated expression, but their expression is uncorrelated to genes in other blocks. This pattern is functionally analogous to what is known as m-dependence in sequences of random variables (e.g., time series), where variables close together in sequence are correlated (but otherwise uncorrelated). Derivations of the central limit theorem have shown that the means (and hence the sum) of these sorts of variables still follow an approximately Gaussian distribution over a broad range of scenarios. In the case of non-independent gene expression, this means that we can view the independent random variable as being the expression value of a group of co-expressed genes (instead of individual genes). Hence, the means (or sums) of these values will still conform to the central limit theorem.

      This problem is addressed in:

      Diananda, P. H. 1955. The central limit theorem for m-dependent variables. Proc. Combin. Philos. Soc. 51:92-95

      Hoeffding, W. & H. Robbins. 1948. The central limit theorem for dependent random variables. Duke Math. J. 15:773-780

      Orey, S. A. 1958. Central limit theorems for m-dependent random variables. Duke Math. J. 25:543-546

      Rosén, B. 1967. On the central limit theorem for sums of dependent random variables, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 7:48-82

      To clarify this, we have added the following text and references:

      Although this derivation implicitly assumes that stochastically expressed genes are independent, this assumption is not strictly required for the distribution of CCIF to be approximately normal. If stochastically expressed genes show clustered co-expression owing to shared regulation, then the sum across these co-expressed blocks is still expected to be approximately normally distributed (as long as there are a reasonably large number of co-expressed clusters) (Diananda 1955; Hoeffding and Robbins 1994; Rosén 1967).

      (4) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells":

      Regarding the statement: "We first determined the coefficient of variation (CV2) of expression for all genes. As expected, this tends to decrease as average expression level increases (Supplementary Figure 2).":

      It would be good to specify how the "expected variation" was calculated exactly. For instance, it was hard to discern from Supplementary Figure 2 how CV^2 decreasing with average expression levels was used in the calculation of expected variation.

      This is described in the methods on page 38

      “A trend line was fitted to the data using non-linear least squares regression (Scran v1.15.9). Genes were defined as variable (2073 genes) based on a one-sided test assuming a normal distribution around the trend but one where deviation changed depending on the mean expression of a given gene (Scran v1.15.9 - modelGeneCV2) with a FDR of < 0.05.”

      (5) In section "Stochastically expressed genes are associated with cell fate determination"

      (a) For readers unfamiliar with the organism ‘Dictyostelium discoideum’, a short description of its life cycle with growth and development/differentiation phases would be useful to provide the right context.

      Corrected as suggested

      (b) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells", it was shown that cell cycle dependent genes are also highly variable (in other words, ‘stochastic’). It would, therefore, be useful to elaborate on the definitions of "stochastically expressed genes, cell cycle-associated genes, and non-variable genes", as used in this section. Admittedly, the distinction does get clearer towards the last section of Results, but some elaboration here would make the reading smoother.

      Corrected as suggested

      (c) If the "cell cycle associated genes" are the same as "cell cycle dependent genes", it would be good to use one term consistently.

      Corrected as suggested

      (d) The developmental index is divided into 10 bins from 0 to 1. Is there a rationale for the choice of a number of bins? Would this choice affect significance tests for "stochastic" vs others? <br /> (The same question may apply to the "Cell type index")

      Significance is robust to the number of bins chosen (e.g. 5-25). Of course, if there are too many bins (low number of genes) or too few bins (addition of noisy data) significance falls. In the case of developmental index, our choice of bins is also based on previous analyses (de Oliveira, et al 2019), which developed the index we used, and showed that a threshold of >0.9 can be used to identify ‘developmentally expressed genes’.

      (6) In Figure 5:

      (a) Does the statement "*** binomial test, p<0.01." (as seen in caption for part C) actually refer to part D?

      Corrected as suggested

      (b) Could the authors please specify what "mis-expressed" means in Figure 5D? Are these genes that are upregulated, downregulated, or both? From what set of genes was the random sampling done?

      Corrected as suggested

      (c) In Figure 5F, is the decrease in CV^2 explained entirely by the increase in mean (as shown in Figure 5E)?

      We appreciate the point made by the reviewer and recognise that disentangling changes in gene expression variation from changes in expression levels is extremely difficult (any changes in burst frequency will necessarily affect expression level). However, we do not think this affects our conclusions, which are supported by results with representative Set1 dependent reporter genes (Figure 5G and H) which suggest that the number of cells expressing (rather than the expression in each cell is affected) in these cases at least.

      (7) In Figure 6A: Could the authors please elaborate on the difference between the rows labelled "WT" and "set1-"? Are they two different types of chimera?

      Corrected as suggested

      (8) In Section "Cell cycle position and gene expression variation interact to control cell type proportioning":

      Is there a graph corresponding to the statement "However, the level of GFP expression in each responding cell did not significantly change."?

      Corrected as suggested

      (9) In section "Influence of stochastic variation on sensitivity to cell cycle perturbations" of the Supplementary text:

      (a) The model for cell cycle bias is not entirely clear. For instance, is the quantity N(t) = U(t) + Q_t U(t) also a probability distribution, like U(t) is? If so, there must be a normalization factor. It was difficult to understand the procedure behind this calculation. Perhaps some more elaboration (with words or a small schematic) on this model/method would help.

      The value of U(t) was originally being used to denote the uniform probability density function (for the uniform distribution), but for clarity this has been changed to follow the convention that U[a,b] denotes the uniform distribution over the interval from a to b (which, in this case would be U[0, 1]), while f(t) is now being used to make it clear that this is the probability density, where f(t) = 1 across the interval. Because the uniform distribution necessarily integrates to 1 over the defined range, it does not need to be normalised. The confusion here is perhaps due to the expression f(t) = 1 being interpreted as defining the probability of sampling a value of t (but in a continuous distribution we can only define the probabilities of sampling over an interval), instead of defining the probability density over the interval from a to b, where f(x) would be 1/(b – a), and hence over the interval of 0 to 1, f(x) would equal 1.

      To help clarify this issue, this section has been rewritten and a new figure (which appears as Supplementary Figure 12) has been added that illustrates the resulting probability density functions for biased sampling from the cell cycle.

      (b) References to Figure 8A, B seem to be indicating Supplementary Figure 12 instead. 

      Corrected as suggested

      Reviewer #2 (Recommendations for the authors):

      This manuscript seems quite interesting, but many sections are so unclear that I cannot follow what has been done. I would suggest slowly going through the manuscript and carefully explaining things. This will probably considerably increase the size of the manuscript, but many sections are too terse to follow even after many, many readings of the Results and figure legend.

      Corrected as suggested

      Some specific comments (this is not at all comprehensive, but rather illustrative)

      Page 2 - 'genes strongly associated with fate choice' - can you explain this a bit more - genes associated with one cell type or another, or genes that somehow regulate the choice?

      Corrected as suggested

      Page 2 - this abstract is quite vague, I would suggest being more specific to reflect what is in the manuscript.

      Corrected as suggested

      Page 3 - 'exhibit bivalent H3K4me3..' please explain 'bivalent' a bit more.

      Corrected as suggested

      Page 7 - 'Bernoulli process with probability that (meaning that is scaled to the size of the temporal interval)' (non-copying symbols deleted) could be simplified.

      Corrected as suggested

      Page 7 - please define all variables/ equation components. What is N? What is x bar? What is s2? The middle paragraph is very difficult to follow.

      This paragraph has been rewritten and a definition of the distribution added for clarity.

      Page 7 - 'genes might logically vary in the value of pi, such variability does not impact our results. Trying to decipher this paragraph, it seems that pi is a function of time, so this could affect the results.

      pi is the probability that a stochastically expressed gene is actually expressed in whatever interval is being considered for all genes. pi will necessarily increase if the time interval considered is increased. The key point is we are considering the probability that any given gene is expressed in the same time interval. In this case, genes could vary in pi, and thus some burst more often and others less often.

      Page 9 - '(it is 98.35 times more likely' there may be too many significant figures here.

      Corrected as suggested

      Page 10 - for the Area Under the Receiver Operating Characteristic Curve (AUROC), what are you classifying? AUROC is typically used for diagnostic tests to determine how well the test can discriminate between two completely different outcomes. What is the input, and what are the outcomes?

      Corrected as suggested

      Figures:

      What are the dashed lines in Figure S2A?

      Corrected as suggested

      What are the X-axes in Figure S3?

      Corrected as suggested

      I do not understand what you are showing in Figure S3.

      Corrected as suggested in results

      In Figure 2B, I cannot find in the text or figure legend any description or explanation of 'Group 1', 'Group 2', or 'Group 3'.

      Corrected as suggested

      Figure 3D needs a lot more explanation; I cannot understand this based on the text and the figure legend.

      Corrected as suggested

      The Set1 work should discuss the work in PMID: 39242621

      Corrected as suggested

      Figure 8 D needs a size bar

      Corrected as suggested

    1. eLife Assessment

      This study provides valuable insights into the role of thalamic nuclei in associative threat and extinction learning, underpinned by a large dataset and rigorous, multipronged analyses. The evidence provided is solid, supporting the main conclusions. Minor analytical refinements notwithstanding, the manuscript will be of broad interest to researchers in learning and memory, fear, thalamic circuitry, and related mental health conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threat-learning task. They find evidence for parallel processes mediated by the mediodorsal, LGn and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

      Comments on revisions:

      I continue to recommend the plotting of individual data points. While there may be individual variance, it is important to quantify this in publication so that future studies can appreciate the uncertainty surrounding test statistics.

    3. Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimulus. Additionally this changes with repeated trials which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar sub nuclei which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Fig. 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g. Sentence 1 in results "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like 'the anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning"

      (2) Fig .1 The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      (3) Fig. 3 Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Fig. 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar sub nuclei, but there are few strong excitatory projections between these sub nuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      (4) In the results section describing Fig. 4-7, there are no statistics supporting the claims made.<br /> There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      (5) FIg. 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei, rather the functional connections are probably mediated through interconnections with cortical visual areas.

      (6) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper which may be better suited for a review or perspective piece.

      (7) In the discussion there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      (8) There is strong evidence that the BOLD responses to the threat-related and safety-related stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      (9) This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

      Comments on revisions:

      The reviewers addressed my major concerns appropriately in the modified manuscript. As long as the MRI analysis concerns of Reviewer 3 are satisfied (MRI analysis is not my expertise), I am satisfied with the modified manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, it examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. It goals was to uncover whether distinct thalamic systems support different modes of learning-automatic survival mechanisms versus more deliberate processes-and to propose a hierarchical pulvinar model of fear conditioning. The manuscript also tried to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) The manuscript has improved methodologically and analytically after the review. Several weaknesses remain, in my opinion, but still findings are valuable and the evidence can be considered as convincing.<br /> a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.<br /> b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using more precise atlases than the normalized AAL for ROI selection.<br /> c) Motion during scanning was poorly controlled. Including the motion parameters as covariates of no interest in the GLM/analysis does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public review:

      Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threatlearning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      We thank the reviewer for this comment. We have added more detailed information about the methods to clarify our procedure. In addition to the original description of our threat learning paradigm in humans, we included the following to page 39-40:

      “Experimental procedure

      Threat learning: Please see the original description in the manuscript.

      Shock level: The shock intensity used in the fear learning paradigm was determined during a preexperiment calibration. Electrodes were attached to the participant’s right hand, and stimulation began at a low level (0.1 mA), gradually increasing in small increments. After each increment, participants verbally rated their discomfort. The procedure continued until the participant identified a level they described as “highly annoying but not painful.” This individualized intensity was then used for that participant throughout the experiment. For safety and ethical reasons, the maximum intensity was capped at 20 mA, and no participant received a shock above this limit.

      Instructions to the participants: Each visual stimulus in our paradigm was first shown to participants for 6 seconds. This initial presentation served as habituation, allowing us to isolate the responses to genuinely new stimuli. Before the experiment began, participants were informed that they would see pictures illuminated with different colored lights, such as red or blue. During the experiment, some pictures might be paired with an electric shock, while others might not. Participants were instructed to pay attention to whether a specific color or pattern was associated with the shock. These instructions were adopted from previous studies in which our group developed this paradigm and found them highly effective for human learning. We therefore used the same approach in the current experiment. These instructions were provided throughout all phases of threat learning, and participants were informed that any shocks delivered would be at the same intensity determined on Day 1.”

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

      We thank the reviewer for this suggestion. We agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results in Figure 7). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across thalamic nuclei, rather than the distribution of individual data points per se. For this purpose, bar plots with standard errors provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      We acknowledge the limitations of fMRI studies and agree with the reviewer that causal claims cannot be made based on correlational neuroimaging evidence. Accordingly, we revised the text to reduce causal interpretations. Specifically, we reworded the sentence identified by the reviewer in the Results section and systematically updated language throughout the manuscript.

      Page 9: “At the block level, both the anterior pulvinar and MD showed increased activation to CS+ vs. CS− (anterior pulvinar: t<sub>(292)</sub> = 4.41, p = 0.00001, d = 0.25; MD: t<sub>(292)</sub> = 6.41, p = 5.83x10<sup>-10</sup>, d = 0.37; Fig. 1b–c), suggesting a possible involvement of these regions in early associative threat learning.”

      Throughout the manuscript, we replaced terms such as “reflects” with “likely reflects” and “indicating” with “consistent with,” and introduced explicitly correlational phrasing where appropriate (e.g., “apparently,” “closely align,” and “seems to”). All revisions are highlighted in green in the revised manuscript.

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      We thank the reviewer for highlighting this important pattern in Trial 3. The CS+ vs. CS− contrast in the third trial in the mediodorsal thalamus remained statistically significant after FDR correction and was correctly reported in the Supplementary Tables. However, we acknowledge that the statistical marker was inadvertently omitted from Figure 1. We have now corrected the figure to include the appropriate significance annotation.

      In addition, we now explicitly describe the attenuation of the CS+ vs. CS− difference by the third trial in the mediodorsal thalamus but not in the pulvinar (page 32):

      “This suggested rapid initial acquisition of the predictive value of the CS+ is thought to be pronounced during the first two trials. The attenuated CS+ vs. CS− differentiation on the third trial specifically in the pulvinar may reflect a decreased requirement for differential thalamic engagement once the initial association has been acquired, or an initial survival fear reaction is expressed. Notably, because the MD sustained the BOLD response to the CS+ in the third trial which may indicate involvement of this nucleus in the consolidation or stabilization of the learned association. This aligns with the wellestablished MD-PFC circuit involved in cognitive processes (Wolff and Halassa, 2024). Additionally, in a previous study using a similar paradigm, we observed sustained CS+ vs. CS− differentiation on the third trial in the nucleus reuniens, as well (Tuna et al., 2025). These findings suggest that trialdependent learning dynamics may vary across thalamic nuclei rather than reflecting a uniform thalamic learning signal. Together, while our paradigm does not inherently distinguish between different stages of learning, such as early acquisition and stabilization, our findings are consistent with stronger associative learning–related engagement during the first two trials, with a reduced differential response by the third trial that may reflect the involvement of different neural processes”.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      We thank the reviewer for this insightful comment. We agree that the network analysis in Figure 3 does not provide a direct anatomical account of pulvinar connectivity and cannot distinguish between direct inter-nuclear interactions and indirect coupling mediated via corticothalamic and thalamocortical pathways, including visual cortex.

      Our intention with this analysis was to characterize functional statistical dependencies among pulvinar divisions during conditioning, rather than to infer monosynaptic anatomical connectivity. Accordingly, the observed network structure should not be interpreted as evidence for direct excitatory projections between pulvinar subnuclei.

      We agree that including visual cortical regions in the network would substantially increase model complexity and could alter the inferred network structure. However, doing so would require a trial-wise, multiregional modeling framework that goes beyond the scope of the present intra-pulvinar analysis.

      In response to this comment, we have now explicitly clarified the assumptions, interpretational limits, and alternative explanations of the network model in the Discussion (page 33):

      “Yet, these intrapulvinar relationships should be understood as a functional and computational model, reflecting statistical dependencies among pulvinar divisions during threat learning, rather than as evidence of direct monosynaptic anatomical connections. Because detailed inter-nuclear anatomical connectivity within the pulvinar remains incompletely characterized, our analysis does not presuppose strong direct excitatory projections between subnuclei. Instead, our findings are intended to highlight candidate functional relationships within the pulvinar during conditioning with different level of data processing, rather than to provide a definitive anatomical map.”

      We also included the following in the Limitations and Future Directions section (page 36):

      “The observed relationships among pulvinar divisions during conditioning are purely functional and do not distinguish direct inter-nuclear interactions from indirect coupling mediated by corticothalamic and thalamocortical pathways, including visual cortical regions. Thus, the pulvinar model may reflect indirect cortical loops, weak or currently undocumented inter-nuclear interactions, or a combination of both.”

      Finally, we added this note to the legend of Fig. 3:

      “Note: The functional relationships among pulvinar divisions during threat learning should be interpreted as computational dependencies derived from statistical associations. These effects may reflect indirect interactions mediated by corticothalamic and thalamocortical pathways (e.g., via visual cortex), rather than direct inter-nuclear connectivity. Elucidating the underlying anatomical mechanisms will require future studies.”

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      We thank the reviewer for this suggestion. In this study, each phase (conditioning, extinction, recall, and renewal) was analyzed separately to characterize thalamic function within that specific phase. Our primary conclusions focus on differences between CS+ and CS− within each phase, rather than comparisons across phases or sessions. Direct statistical comparisons across phases were therefore not performed, as they fall outside the scope of our main hypotheses.

      We have clarified this in the revised manuscript to make the rationale for our analytic approach explicit. Added to page 8:

      “The purpose of this study is to investigate thalamic function during each learning phase separately, focusing on CS+ vs. CS− differences within phases rather than comparing activation across phases. This phase-specific approach allows us to characterize thalamic functional dynamics within each stage of learning and memory, avoiding potential confounds arising from the distinct processes of conditioning, extinction, and recall.”

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      We thank the reviewer for this point. Reciprocal connections between the visual cortex and the pulvinar are established, but the precise projections to specific pulvinar divisions remain unknown. We have added a note to the Figure 8a caption to clarify this (Figure 7a in the original version).

      “Note (panel a): Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      We thank the reviewer for this comment. We aimed to integrate our empirical findings within a broader conceptual framework to provide a complementary narrative, rather than presenting isolated observations without connecting them to theoretical context. This approach is intended to strengthen the interpretive value of the study while remaining grounded in primary data.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      We thank the reviewer for raising this important point. We fully agree that the fMRI BOLD signal reflects large-scale changes in population activity and may fail to capture more subtle or distributed neural representations. Accordingly, the absence of a CS+ vs. CS− BOLD difference should not be interpreted as evidence that a region is not involved in discriminating these stimuli. Rather, our inferences are limited to differences in aggregate activation at the spatial and temporal resolution of fMRI.

      To partially address this limitation, we analyzed anatomically defined thalamic subregions; however, we acknowledge that finer-scale subdivisions and cell-type– specific processing likely exist that are not currently resolvable in human fMRI. Such distinctions may be better investigated using invasive recordings or circuit-level approaches in rodents or non-human primates. This limitation has now been explicitly acknowledged in the Limitations section of the manuscript (page 36):

      “Pulvinar divisions, MD, and LGN each contain diverse neuron subtypes and finer anatomical subdivisions that may serve distinct functions. Importantly, the absence of CS+ vs. CS− differences in BOLD activity should not be interpreted as a lack of stimulus-specific processing, as such distinctions may occur without changes in overall activation detectable by fMRI…”

      (7) There is strong evidence that the BOLD responses to the threat-related and safetyrelated stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      We thank the reviewer for this constructive suggestion. In response, we have revised the discussion to present our interpretations as possible or plausible explanations, rather than definitive conclusions, to better reflect the strength of the current evidence. The changes are marked in green throughout the Discussion section.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

      Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learningautomatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      (a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      We thank the reviewer for raising this point. While the spatial resolution of fMRI (3 mm isotropic) does limit voxel-wise examination of very small nuclei, our analyses were not performed at the single-voxel level. Instead, signals were extracted using anatomically defined masks for each thalamic nucleus, which is a standard and widely used approach for studying small subcortical structures with fMRI. This strategy increases signal-to-noise ratio and mitigates partial-volume effects by aggregating activity across voxels belonging to the same anatomical region.

      (b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      We thank the reviewer for pointing out the availability of specialized thalamic atlases. In our study we used the Automated Anatomical Labelling Atlas 3 (AAL3 atlas), which includes thalamic subdivisions (including pulvinar and other nuclei) among its 150+ whole-brain regions and is widely used for ROI extraction in normalized fMRI analyses. This choice allowed us to define consistent ROIs across the entire brain such as the amygdala and hippocampus within the same parcellation framework and to extract functional signals at the resolution of our preprocessed fMRI data.

      While histology-informed probabilistic atlases offer finer microanatomical segmentation of the thalamus, they are implemented primarily for structural segmentation pipelines (e.g., FreeSurfer) and do not change the fact that AAL3’s thalamic subdivisions are established and anatomically reasonable ROIs for functional studies at standard fMRI resolutions. AAL3 thus provides a practical and valid choice for our whole-brain activation and connectivity analyses.

      (c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      We thank the reviewer for raising this concern regarding anatomical precision. The data were resampled to 2 × 2 × 2 mm resolution in SPM12, and a 6 mm FWHM Gaussian smoothing kernel was applied. Gaussian smoothing does not uniformly mix immediately adjacent voxels; rather, it applies distance-weighted averaging with a standard deviation of approximately 2.55 mm (FWHM = 2.355σ). At 2 mm resolution, this corresponds to ~1.3 voxels, meaning that signal contribution decreases smoothly with spatial distance rather than reflecting simple voxel averaging. Moreover, all statistical analyses were conducted at the ROI level using anatomically defined masks, rather than voxel-wise inference within nuclei.

      To empirically assess whether smoothing may have introduced boundary-driven spillover effects, we divided the mediodorsal (MD) thalamus into medial and lateral divisions and examined the CS effect separately in each. The CS effect did not differ between subdivisions (MD subdivision X CS interaction: F<sub>(1, 292)</sub> = 0.50, p = 0.48).

      Additionally, across trials, the CS+ vs. CS− effect was observed in both subdivisions and showed comparable magnitudes (see Author response image 1). The effect sizes were also comparable across MD divisions as presented in Author response table 1).

      Author response image 1.

      Mean activation in MD subdivisions during threat learning

      Author response table 1.

      Point estimates and 95% confidence intervals of effect sizes (Cohen’s d) for CS+ vs. CS− contrasts in MD, MDm, and MDl During Early Threat Learning

      If smoothing had artificially driven the MD effect via boundary spillover, one would expect consistent asymmetry or substantially larger effects in one subdivision relative to the other. Instead, the CS effect was distributed across both medial and lateral MD, supporting the interpretation that the observed activation reflects intrinsic MD signal rather than smoothing-related contamination.

      (d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      Our analyses are within-subject, so each participant serves as their own control, minimizing the impact of motion differences across conditions. Functional data were preprocessed with fMRIPrep 20.0.2, which estimates motion parameters. The motion estimations are included in the GLM to account for residual motion-related variance in SPM12. The connectivity analyses were conducted in CONN, which also includes these motion parameters as regressors and applies additional denoising steps to further reduce motion-related effects. Together, these procedures make it highly unlikely that motion systematically influenced the observed condition differences.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      We thank the reviewer for this important comment. We have now explicitly reported the number of participants and trials contributing to each analysis throughout the manuscript, including the main text, figure captions, and supplementary materials.

      Specifically, under Materials and Methods (page 38), we now clarify the sample sizes for each learning phase:

      “We analyzed fMRI data from 293 participants during fear conditioning, 320 during extinction, 412 during extinction recall, and 312 during threat renewal.”

      In addition, all figure captions now report the corresponding sample sizes and trial numbers. For example, the caption to Figure 1 (pages 7–8) states:

      “…Block-level comparisons were assessed using paired t-tests, while trial-level effects were examined using a 2 × 2 repeated-measures ANOVA, followed by post hoc comparisons between CS+ and CS− across four trials. Multiple comparisons were controlled using false discovery rate (FDR) correction. Conditioning sample size: n = 293. Detailed statistical parameters are provided in Supplementary Tables 1–2.”

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      Cross-validation strategies were applied to the mediation analyses, which are regressionbased and can be sensitive to extreme values or overfitting, ensuring that observed effects generalize beyond the sample. In contrast, the repeated-measures ANOVA tests within-subject condition differences, and is inherently robust to between-subject variability. For these inferential tests, cross-validation or sample-splitting is not typically applied.

      However, following the reviewer’s recommendation, we conducted a cross-validation analysis focusing on the anterior pulvinar and the mediodorsal thalamus, the primary regions of interest in this study. The full sample (N = 293) was randomly divided into three subsamples (n<sub>1</sub> = 106, n<sub>2</sub> = 91, n<sub>3</sub> = 96). For each iteration, we conducted a repeatedmeasures ANOVA (RM-ANOVA) within one subsample and then examined the stability of the CS+ vs. CS− difference in the remaining two subsamples combined. The CS+ vs. CS− difference was statistically significant in most folds for both the mediodorsal thalamus and the anterior pulvinar. Importantly, effect sizes were comparable across folds within each nucleus, indicating stable estimates of the CS effect.

      Finally, we observed a comparable pattern of CS+ vs. CS− differences at the trial level in both the mediodorsal thalamus and the anterior pulvinar. Critically, the effect sizes of these differences were stable across most cross-validation folds

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MDanterior Puv is reported).

      We thank the reviewer for raising this important point. We would like to clarify that the analyses were not limited to a single, selectively reported association. The relationship between the MD and the anterior pulvinar was evaluated while explicitly accounting for other pulvinar subdivisions, as well as for thalamic input outside the pulvinar.

      Specifically, potential contributions from other pulvinar nuclei were controlled by including them in the regression model (Fig. 2 in the manuscript), and the LGN was included as an additional control region. These analyses therefore test whether the MD–anterior pulvinar association is specific, rather than reflecting a more general thalamic or pulvinar-wide effect. With respect to hypothesis testing, the study was explicitly hypothesis-driven, grounded in functional evidence motivating a specific prediction about MD–anterior pulvinar interactions.

      Still, in response to the reviewer’s suggestion, we further examined pairwise relationships among thalamic subregions. Specifically, we assessed the association between the MD and each pulvinar subdivision using partial correlations, controlling for the remaining pulvinar subdivisions in each analysis. For example, the partial correlation between the MD and the lateral pulvinar was computed while controlling for the activation of the anterior, inferior, and medial pulvinar subdivisions.

      The partial correlation between the MD and the anterior pulvinar was consistent across all four trials of threat learning, whereas the other pulvinar subdivisions did not exhibit a consistent pattern. To evaluate the robustness of these effects, we applied a bootstrap procedure (10,000 resamples) to estimate 95% confidence intervals for each partial correlation. As presented in Figure 4b, only the anterior pulvinar–MD association remained robust, with confidence intervals that did not include zero. In contrast, the confidence intervals for most other pulvinar subdivisions included zero, indicating non-robust associations.

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      We thank the reviewer for this constructive suggestion. While the original manuscript already discussed key limitations in the Discussion section (page 36; e.g., “Although distinct thalamic roles in threat learning have been proposed, fMRI data do not fully capture the complexity of this structure…”), we agree that these considerations would benefit from clearer organization and visibility.

      To address this point directly, we have now added a dedicated “Limitations and Future Directions” subsection to the manuscript. This subsection explicitly summarizes the principal limitations of the study—including methodological constraints of fMRI and anatomical resolution—and outlines specific avenues for future research to address them. This change makes the limitations more transparent and allows readers to more easily incorporate them into their evaluation of the findings.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      We thank the reviewer for this important suggestion and fully agree with the value of data and code sharing for transparency and reproducibility.

      The data supporting the findings of this study are derived from a larger, actively used database that is currently involved in ongoing projects. For this reason, the full dataset cannot yet be publicly released. However, the data underlying the reported analyses are available upon reasonable request from the corresponding author, subject to standard data-use agreements.

      To facilitate reproducibility, all analysis scripts and pipelines used in this study—including preprocessing and analysis workflows implemented in SPM12, and CONN—are available upon request and can be shared with researchers seeking to replicate or extend the reported findings.

      We have clarified this data and code availability statement in the manuscript (page 46).

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editors for this important note. Full statistical reporting, including test statistics, degrees of freedom, exact (raw and corrected) p-values, effect sizes, and 95% confidence intervals, is provided for all key analyses in Supplementary Tables 1–9. In addition, uncertainty estimates and major statistics tests are now explicitly reported throughout the main text, as recommended by the reviewers, irrespective of statistical significance.

      During this revision process, we conducted a comprehensive internal consistency check of all reported statistics and figure annotations. We identified and corrected minor discrepancies between some statistical annotations in the figures and the corresponding results reported in the Supplementary Tables. All figures have now been updated to ensure full consistency with the reported analyses. These corrections do not alter the results or conclusions of the study.

      Reviewer #1 (Recommendations for the authors):

      (1) What is the significance of using two different head coils? Were the data comparable from each coil? How did the authors determine this?

      We thank the reviewer for this important question. Data were acquired using two different receiver head coils across participants. Receiver coils primarily influence signal-to-noise ratio (SNR) and spatial sensitivity profiles, rather than the physiological basis of the BOLD response itself (Triantafyllou et al., 2011).

      Importantly, all analyses were based on within-subject contrasts (CS+ vs. CS−), which are robust to global signal scaling differences that may arise from coil sensitivity variations. In addition, standard preprocessing procedures—including intensity normalization, spatial normalization, and nuisance regression—further minimized potential coil-related variability.

      To empirically evaluate whether acquisition differences influenced our results, we conducted a repeated-measures ANOVA testing the Trial × CS × Site interaction (where Site reflects acquisition location and associated scanning setup, including receiver coil configuration) during fear conditioning (N = 293). As shown in Author response table 2, none of the thalamic nuclei demonstrated a significant interaction effect, and all effect sizes were negligible (η<sup>2</sup>p ≤ .01)

      Author response table 2.

      Repeated-Measures ANOVA results for the Trial X CS X site interaction across all relevant thalamic nuclei during fear conditioning.

      (2) Why were the data smoothed? This could have a negative impact on the specificity of the signals averaged within the pre-defined thalamic ROIs.

      Spatial smoothing was applied to improve signal-to-noise ratio and statistical stability in small, deep thalamic subregions, which are particularly susceptible to noise. We acknowledge that smoothing can reduce spatial specificity. However, our analyses were based on anatomically predefined thalamic ROIs and focused on average activation within each region rather than voxel-wise localization. Under this approach, modest smoothing (i.e., a 6-mm full-width at half-maximum smoothing kernel, rather than the commonly used 8-mm kernel) primarily increases reliability while any signal mixing across adjacent regions would be expected to reduce regional specificity and bias effects toward the null, rather than produce spurious or false-positive differences.

      Additionally, we conducted robustness analyses to examine whether spatial smoothing artificially influenced our results. Specifically, we subdivided the mediodorsal thalamus into medial and lateral anatomical regions and compared activation across these subregions. The activation patterns were comparable across both subdivisions, indicating that the observed mediodorsal thalamus effect is unlikely to reflect boundary spillover resulting from smoothing. If smoothing had driven the effect, we would expect differential signal patterns across the subdivisions rather than comparable activation. (See full response to Weakness C, Reviewer 3, as well as Author response image 1 and Author response table 1 in our response).

      (3) Did the authors consider using any null models to determine whether the observed PPI results could have been observed by chance? E.g., block-resampling nulls scramble temporal order while preserving temporal autocorrelation, and can determine whether subtle differences in autocorrelation across regions can give rise to the observed signatures.

      We thank the reviewer for this thoughtful suggestion. All PPI analyses were conducted using the default CONN toolbox pipeline. In this framework, PPI effects are estimated within a GLM at the first level following standard denoising procedures that reduce motion- and physiology-related variance and apply temporal filtering. Importantly, PPI effects are modeled as subject-level contrast terms rather than computed from raw timeseries correlations.

      Group-level inference was performed on these subject-level contrast estimates using paired t-tests with FDR correction across regions. To further assess whether the observed effects could arise by chance, we additionally performed 10,000 bootstrap resamples of the CS+ vs. CS− differences to evaluate the stability of the effects. While we did not implement explicit block-resampling null models that preserve temporal autocorrelation, the combination of first-level GLM modeling following denoising, large sample size (N ≈ 300), and convergent inferential and resampling procedures provides a rigorous and standard assessment of PPI effects. We have revised the manuscript to clarify these procedures and their rationale.

      We added this language to directly address the reviewer’s concern and revised the connectivity analyses section to clarify the workflow (page 44):

      “Following standard denoising procedures—including regression of motion- and physiology-related confounds and temporal filtering—condition-dependent connectivity effects were inferred from subjectlevel generalized psychophysiological interaction (gPPI) contrast estimates rather than from raw timeseries correlations. This GLM-based framework reduces the likelihood that observed PPI effects reflect differences in temporal autocorrelation or spectral properties across regions rather than genuine task-dependent interactions.”

      (4) The authors may wish to report results in text, as there are currently many demonstrative statements that are not associated with requisite uncertainty estimates, making inference challenging.

      We thank the reviewer for this helpful suggestion. We have revised the Results section to explicitly report statistical outcomes in the main text for all key findings, including appropriate uncertainty estimates (e.g., test statistics, effect sizes, and p-values) alongside demonstrative statements. This ensures that all inferences in the text are directly supported by quantitative evidence.

      Additionally, the full statistical details, including test statistics, degrees of freedom, effect sizes, 95% confidence intervals, and both raw and FDR-corrected p-values, are provided in Supplementary Tables 1–9. These changes improve clarity and transparency while avoiding redundancy. Newly added text in the Results section is highlighted in green.

      (5) I could not find any information about the EBICglasso model in the Methods section, nor information about how the centrality measures were estimated. Given the lack of transparency, I recommend down-weighting the often overly-strong language regarding the conclusions of this analysis.

      We have revised and added these details along with other details to the Statistical tests section on pages 42-44:

      “Statistical tests

      All statistical tests were conducted using JASP versions 0.18.3 and 0.19.3(JASP Team, 2024).

      Activation Differences across all phases of threat learning

      In each threat learning phase, we used paired t-tests to examen the differences in activation of the thalamic nuclei in response to CS+ vs. CS- at the block level (average activation across trials), and 2x2 RM-ANOVA to estimate the differences in activation at the trial-wise level. Assumptions of sphericity were checked, and Greenhouse-Geisser corrections were applied where necessary. This model was followed by post hoc tests to estimate the differences at the trial level and False discovery rate (FDR) correction was applied for each question.

      Network analyses of the within pulvinar relationships during conditioning

      The network analyses examined functional relationships between pulvinar divisions. Nodes corresponded to block-level activation estimates of the CS+ minus CS− contrast for each pulvinar division, yielding four nodes (one per division). Networks were estimated using a Gaussian graphical model with EBICglasso (LASSO regularization) based on Pearson correlation matrices, with the EBIC tuning parameter set to γ = 0.5. Edge weights represent partial correlations.

      Three centrality measures were computed on the estimated weighted partial-correlation network: node strength, defined as the sum of the absolute edge weights directly connected to a node; closeness, defined as the inverse of the average shortest path length from a node to all other nodes; and betweenness, defined as the proportion of shortest paths between all pairs of nodes that pass through a given node. Shortest paths were computed using inverse edge weights, consistent with standard practice for weighted networks. Centrality indices were normalized.

      Network accuracy and centrality stability were assessed using nonparametric bootstrapping (10,000 iterations) to estimate confidence intervals for edge weights and centrality measures. All analyses were conducted in JASP (versions 0.18.3 and 0.19.3) using default settings unless otherwise specified, following the procedures described in Epskamp, Borsboom, and Fried (2018).

      Mediation analyses of within pulvinar relationships during conditioning

      Mediation models of the relationships between the activations in pulvinar divisions were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. All variables were zstandardized prior to analysis. Block-level activation estimates from the inferior and lateral pulvinar were entered as predictors, activation in the medial pulvinar was specified as the mediator, and activation in the anterior pulvinar was specified as the outcome variable.

      To assess the robustness and generalizability of the mediation effects, we conducted 3-fold crossvalidation. The full sample (N = 293) was randomly partitioned into three non-overlapping sub-samples (n = 91, 96, and 106). In each iteration, the mediation model was estimated in one sub-sample, while the remaining sub-samples were used to assess the stability of parameter estimates and indirect effects. This procedure resulted in six cross-validation iterations, allowing evaluation of whether the direction and magnitude of the indirect effect were consistent across independent subsets of the data. Mediation models were estimated using the lavaan package (Rosseel, 2012) with maximum likelihood estimation. Indirect effects were evaluated using bias-corrected percentile bootstrap confidence intervals based on 10,000 resamples, as recommended by Biesanz, Falk, and Savalei (2010). An indirect effect was considered significant when the 95% confidence interval did not include zero (p < 0.05).”

      (6) Bar plots are not effective ways to report group-level data. I recommend replacing all bar plots with visualisations that expose the distribution of the data, such as a violin plot or a raincloud plot.

      We thank the reviewer for this suggestion. In general, we agree that visualizations exposing the full data distribution can be highly informative, and we therefore present distribution-based plots for several analyses (e.g., connectivity results). However, for the activation analyses, our primary goal was to highlight trial-to-trial changes and overall patterns across conditions, rather than the distribution of individual data points per se. For this purpose, bar plots provide a clearer representation of the directional effects and facilitate comparison across trials and conditions.

      (7) The thought bubbles are atypical of scientific figures.

      The figure has been revised to remove the thought bubbles.

      (8) Figure 7 - there are many connections not shown in this figure, suggesting that it is sufficiently oversimplified as to be potentially misleading. For instance, the authors offer no anatomical connections between pulvinar and the cortical hierarchy; however, these connections are ample and (likely) highly important for the functionality assessed here. Similarly, there is no room in the figure for the integration of the shock stimuli (presumably via the spinothalamic tract) and the visual stimuli (via the retina/LGn).

      We agree that the pulvinar has extensive cortical and sensory input/output connections that are not depicted in Figure 7. Our intention was not to provide a complete anatomical wiring diagram, but rather a simplified functional model derived from observed statistical dependencies. We have revised the figure and added an explicit note to the legend clarifying that pulvinar–cortical and sensory pathways (e.g., retina/LGN and spinothalamic inputs) are intentionally omitted due to incomplete subnuclear-level anatomical characterization, and that their omission should not be interpreted as a lack of importance. We added this to Figure 7 legend:

      “Note (panel a):

      Known pulvinar–cortical connections, as well as sensory input pathways (e.g., visual inputs via the retina/LGN and nociceptive inputs via the spinothalamic tract), are not explicitly shown. These connections are well established anatomically but were omitted due to their heterogeneity and incomplete characterization at the level of pulvinar subnuclei. Their absence should not be interpreted as a lack of anatomical or functional relevance.”

      Reviewer #2 (Recommendations for the authors):

      (1) It's somewhat confusing that Figures 1,4,5 D and E are not in the text until later in the results section. Perhaps these should be presented in the figures in the same order they are discussed in the text, although this is a stylistic issue.

      We thank the reviewer for this comment. To improve clarity and align the figures with the structure of the Results section, we reorganized the figures. Specifically, we added a new figure (Figure 7) that consolidates all connectivity analyses. Figures 1, 4, and 5 now focus exclusively on activation results, while Figure 7 presents connectivity results only. This reorganization allows the figures to follow the flow of the text more closely and makes the narrative of each figure clearer.

      (2) Stylistic: I would strongly recommend adding n numbers and describing the basics of statistical tests used and how multiple comparisons were accounted for in the legend for Figures 1,4, and 5.

      We thank the reviewer for this recommendation. We have added the sample sizes (n) and brief descriptions of the statistical tests used, including how multiple comparisons were handled, to the legends of Figures 1, 4, and 5. In addition, we direct the reader to the Supplementary Tables, which were submitted with the original manuscript and provide full statistical details, including test statistics (t, F), degrees of freedom, effect sizes, 95% confidence intervals, raw p values, and corrected p values. Finally, we further elaborated on the statistical tests on pages 42–44, as detailed in our response to Recommendation 5 (Reviewer 1).

      Reviewer #3 (Recommendations for the authors):

      As previously indicated, please note that no information is included in the manuscript about data and code availability. Although you mainly use toolboxes for data analyses, any script(s) that you have used to run things would be great to upload for reproducibility purposes.

      Also, it would be good to include a limitations subsection in the manuscript.

      Thank you for these recommendations. We added limitations subsection to the manuscript. See our responses under Comments 5 and 6 (Reviewer 3, Public Review).

      In terms of data analyses:

      (1) It would be ideal if you quantify in-scanner motion for the different conditions to see if there were no differences in motion due to the task.

      Head motion was estimated at each time point as part of standard preprocessing, and motion parameters were included as nuisance regressors in all first-level models. Because motion estimates are defined per volume rather than per experimental condition, condition-specific motion metrics were not explicitly computed. Importantly, this approach removes motion-related variance uniformly across the time series and therefore controls for potential motion effects across all task conditions. Any residual motion would be expected to increase noise rather than systematically bias condition contrasts.

      (2) You also may want to indicate if normalization followed the SPM 12 default and the data was resampled to 2 x 2 x 2 mm, or kept the same. It is not stated in the data preprocessing subsection of the methods.

      We thank the reviewer for this suggestion. We have now clarified this point in the manuscript (page 41):

      “In addition, spatial normalization was performed with data normalized to Montreal Neurological Institute (MNI) space and resampled to a 2 × 2 × 2 mm<sup>3</sup> voxel grid, followed by spatial smoothing with a 6-mm full-width at half-maximum Gaussian kernel.”

      (3) It is important to indicate how many subjects went into each analysis. Also, it is not clear, based on the current methods section, how many observations per condition were used. That can be reported in the text or the figures.

      We thank the reviewer for this comment. This information has now been added to the Methods section and the relevant figure legends, as described in our response to Comment 2 (Reviewer 3, Public Review).

      References

      Triantafyllou C, Polimeni JR, Wald LL. 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55:597–606. DOI: https://doi.org/10.1016/j.neuroimage.2010.11.084, PMID: 21167946

    1. eLife Assessment

      This manuscript reports an important study in which the authors apply smFRET imaging to probe HIV-1 Env conformational dynamics in the presence of antibodies. Previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Through the cutting-edge application of smFRET imaging, the study provides convincing insights into the mechanisms of action of relevant antibodies.

    2. Reviewer #1 (Public review):

      The authors have considered a panel of antibodies that target epitopes at the gp120/gp41 interface (8ANC195 and PGT151), the fusion peptide in the gp41 domain (VRC34), and the MPER region of gp41 (DH511.2_K3 and VRC42). They also investigate 10E8.4/iMab, which is an engineered bispecific antibody that targets the MPER and the CD4 receptor. On a technical note, they have applied a double amber codon-readthrough strategy to incorporate the non-natural TCO*A amino acid, which gets labeled through click chemistry. This approach should result in less disruption of the native Env structure as compared to the peptide insertion previously used for smFRET imaging of Env. Furthermore, previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Altogether, through the cutting-edge application of smFRET imaging, the study provides novel insights into the mechanisms of action of interesting and clinically relevant antibodies.

      In validating the functionality of the S401TAG/R542TAG Env, the authors performed infectivity assays and observed 20% infectivity as compared to wild-type (Figure S2A). However, the text equates this with "20% dual-amber suppression efficiency". This would benefit from some explanation. Why do the authors interpret infectivity as reporting on amber suppression efficiency, and not the functional cost of modifying Env, which is probably unavoidable? Or a combination of both? Is there data to suggest that 100% amber suppression would leave Env 100% functional? If so, this would be valuable to show. If not, the text should be clarified.

      The authors state that the contour plots in Figure 2E reveal "dynamic sampling" of the observed FRET states. Strictly speaking, as presented, the contour plots (and FRET histograms) provide no information on dynamics per se. They indicate only the relative thermodynamic stabilities of the FRET states; transitions between states are a matter of interpretation. The TDPs, shown later in Figure 5A, nicely display the dynamics. More importantly, interpretation of the contour plots is challenging, as some seem to suggest an evolution toward lower FRET states. This is especially evident in Figures 2F and 3D, which suggest that the system evolves into a stable 0.1-FRET state (CO) after about 3 sec. Unless the authors want to conclude something from this, I would suggest that they consider removing the contour plots, since their interpretations are fully supported by the FRET histograms alone.

      The data indicating that Env conformation is manipulated by 10E8.4/iMab is interesting. If I understand correctly, 10E8.4/iMab is an engineered antibody with one Fab targeting MPER and the second Fab targeting CD4. In the absence of CD4, could the difference between 10E8.4/iMab and the other MPER antibodies be due to 10E8.4/iMab being monovalent with respect to MPER binding?

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Xu and co-workers unveil two distinct modes of neutralisation by gp41-targeted broadly neutralizing antibodies on HIV-1 Env. So far, it was unclear as to how the mechanism of neutralisation occurred for this subset of neutralising antibodies (that can target the fusion peptide or the membrane proximal external region of the gp41 subunit). Thanks to single-molecule FRET, the authors show that the majority of broadly neutralizing antibodies stabilize the closed Env conformation (named State 1 since the original work by Munro and colleagues PMID: 25298114). Interestingly, the bivalent 10E8.4/iMab stabilized in turn a CD4-bound open state of Env. The two modes of neutralization described for these antibodies show previously unknown allosteric mechanisms that stabilize closed and open Env conformation, stressing the importance of Env conformational dynamics and its efficiency during the process of fusion.

      Strengths:

      The article is well-written, and the figures fully depict the data in a convincing way. The authors have used smFRET, which is now established in the field as a good tool to assess Env dynamics.

      Weaknesses:

      (1) The limited controls on how click chemistry affects Env (as labelled Env HIV virions were not evaluated).

      (2) Photobleaching of donor and acceptor molecules occurs right after 10sec exposure.

      (3) Other limitations are well described in the corresponding section.

    4. Author response:

      eLife Assessment

      This manuscript reports an important study in which the authors apply smFRET imaging to probe HIV-1 Env conformational dynamics in the presence of antibodies. Previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Through the cutting-edge application of smFRET imaging, the study provides convincing insights into the mechanisms of action of relevant antibodies.

      We appreciate this positive assessment and thank the reviewers for their time and constructive comments. We will make the following changes in the revised manuscript.

      (1) Clarify the distinction between suppression efficiency and functional cost.

      (2) Add controls: smFRET experiments in the presence of monovalent 10E8.4 and iMab individually and compare results with the bivalent 10E8.4/iMab that we currently have.

      (3) Increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling.

      (4) Add discussion on conformational populations probed by smFRET versus structural analyses, Env conformational heterogeneity, ligand effects, and how these approaches complement each other.

      (5) Further clarify the assignments of multiple conformational states by smFRET, the heterogeneity of Env spikes and virion morphology by cryoET, and the focus of the current smFRET-focused storyline.

      Please find below our provisional responses to the public reviews. We will provide detailed point-by-point responses upon submission of the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have considered a panel of antibodies that target epitopes at the gp120/gp41 interface (8ANC195 and PGT151), the fusion peptide in the gp41 domain (VRC34), and the MPER region of gp41 (DH511.2_K3 and VRC42). They also investigate 10E8.4/iMab, which is an engineered bispecific antibody that targets the MPER and the CD4 receptor. On a technical note, they have applied a double amber codon-readthrough strategy to incorporate the non-natural TCO*A amino acid, which gets labeled through click chemistry. This approach should result in less disruption of the native Env structure as compared to the peptide insertion previously used for smFRET imaging of Env. Furthermore, previous implementations of smFRET imaging of HIV-1 Env, which focus on gp120 conformation, have yielded limited information on antibodies that target gp41. Altogether, through the cutting-edge application of smFRET imaging, the study provides novel insights into the mechanisms of action of interesting and clinically relevant antibodies.

      Thank you for the positive comments!

      In validating the functionality of the S401TAG/R542TAG Env, the authors performed infectivity assays and observed 20% infectivity as compared to wild-type (Figure S2A). However, the text equates this with "20% dual-amber suppression efficiency". This would benefit from some explanation. Why do the authors interpret infectivity as reporting on amber suppression efficiency, and not the functional cost of modifying Env, which is probably unavoidable? Or a combination of both? Is there data to suggest that 100% amber suppression would leave Env 100% functional? If so, this would be valuable to show. If not, the text should be clarified.

      We acknowledge this concern and will clarify the distinction between suppression efficiency and functional cost in the revision. The observed reduction in infectivity does not translate into the functional loss; instead, it more reflects the efficiency of suppression (one of the critical limitations of applying genetic code expansion in mammalian cells), as evidenced by reduced Env expression and incorporation on virions (Fig. 1B). In support of the preservation of Env functionality, tag-free and dual-ncAA-incorporated Env virions exhibited similar dose-dependent neutralization sensitivity against trimer-specific neutralizing antibodies (Fig.1D). We have previously discussed several limitations of amber suppression in mammalian cells combined with smFRET viral systems (PMID: 38232732; PMID: 40716060). In brief, orthogonal tRNA/aaRS pair–mediated amber suppression (reassigning/repurposing amber stop codons to non-canonical amino acids) of the introduced ambers in the target protein (Env in our case) must compete with the cellular translation system, particularly release factors that recognize amber codons and terminate translation. Readthrough of endogenous amber codons in virus-producing cells (in our case, HEK293T) can disrupt normal protein expression and virus production. Similarly, readthrough of preexisting amber codons in HIV-1 ORFs other than the targeted ambers in Env can disrupt virus assembly, which we addressed by generating an amber-free provirus (PMID: 38232732). Introducing two amber codons into Env further reduces efficiency, as dual suppression requires two sequential successful suppression events within the same Env molecule.

      The authors state that the contour plots in Figure 2E reveal "dynamic sampling" of the observed FRET states. Strictly speaking, as presented, the contour plots (and FRET histograms) provide no information on dynamics per se. They indicate only the relative thermodynamic stabilities of the FRET states; transitions between states are a matter of interpretation. The TDPs, shown later in Figure 5A, nicely display the dynamics. More importantly, interpretation of the contour plots is challenging, as some seem to suggest an evolution toward lower FRET states. This is especially evident in Figures 2F and 3D, which suggest that the system evolves into a stable 0.1-FRET state (CO) after about 3 sec. Unless the authors want to conclude something from this, I would suggest that they consider removing the contour plots, since their interpretations are fully supported by the FRET histograms alone.

      We agree and will remove the contour plots, as they do not add meaningful information beyond what the histograms show.

      The data indicating that Env conformation is manipulated by 10E8.4/iMab is interesting. If I understand correctly, 10E8.4/iMab is an engineered antibody with one Fab targeting MPER and the second Fab targeting CD4. In the absence of CD4, could the difference between 10E8.4/iMab and the other MPER antibodies be due to 10E8.4/iMab being monovalent with respect to MPER binding?

      We appreciate this question. To answer this, we will perform smFRET experiments in the presence of 10E8.4 and iMab individually and compare those with the bivalent 10E8.4/iMab.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Xu and co-workers unveil two distinct modes of neutralisation by gp41-targeted broadly neutralizing antibodies on HIV-1 Env. So far, it was unclear as to how the mechanism of neutralisation occurred for this subset of neutralising antibodies (that can target the fusion peptide or the membrane proximal external region of the gp41 subunit). Thanks to single-molecule FRET, the authors show that the majority of broadly neutralizing antibodies stabilize the closed Env conformation (named State 1 since the original work by Munro and colleagues PMID: 25298114). Interestingly, the bivalent 10E8.4/iMab stabilized in turn a CD4-bound open state of Env. The two modes of neutralization described for these antibodies show previously unknown allosteric mechanisms that stabilize closed and open Env conformation, stressing the importance of Env conformational dynamics and its efficiency during the process of fusion.

      Strengths:

      The article is well-written, and the figures fully depict the data in a convincing way. The authors have used smFRET, which is now established in the field as a good tool to assess Env dynamics.

      We appreciate these positive comments!

      Weaknesses:

      (1) The limited controls on how click chemistry affects Env (as labelled Env HIV virions were not evaluated).

      We agree. Our validation focused on ncAA-incorporated Env HIV-1 virions, but not the fluorescently labeled virions. To address this, we will increase the number of repeats in neutralization experiments to reduce variability and, where feasible, perform infectivity and neutralization assays after click chemistry labeling. We will attempt to do it. However, we expect that the additional handling time required for labeling and the centrifugation steps needed to remove free dyes, which can deform/disrupt viral membranes and degrade virions, together with the low dual-amber suppression efficiency, will make these experiments technically challenging as an additional layer of functional validation in live cells. On a related note, we have previously performed real-time tracking of single click-labeled Env virion internalization and trafficking in live cells (PMID: 38232732), supporting the retained functionality of click-chemistry-labeled Env.

      (2) Photobleaching of donor and acceptor molecules occurs right after 10sec exposure.

      We acknowledge this limitation and will include it in the corresponding section.

      (3) Other limitations are well described in the corresponding section.

      We appreciate this comment.

    1. eLife Assessment

      This study provides valuable insights into the cellular dynamics underlying accelerated tooth regeneration in a vertebrate model. Using single-nucleus RNA sequencing across multiple time points, the authors present a well-structured analysis of cell populations, trajectories, and intercellular signaling events associated with this process. The strength of evidence is solid but incomplete, as the conclusions are primarily supported by computational inference, without experimental validation of key findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used single-nucleus RNA sequencing (snRNA-seq) to investigate accelerated tooth replacement following tooth plucking in cichlid fish. They analyzed four stages of regeneration using elegant and well-designed approaches to characterize cellular trajectories and interactions within the dental epithelium and mesenchyme during the accelerated replacement process. Their analyses identified cell-type-specific gene expression profiles and intercellular signaling interactions associated with whole-tooth regeneration.

      Strengths:

      This is a highly interesting and thoughtfully executed study that provides compelling and convincing insights into the mechanisms underlying accelerated tooth regeneration.

      Weaknesses:

      The manuscript currently lacks experimental validation of the single-nucleus RNA-seq data.

    3. Reviewer #2 (Public review):

      Summary:

      Mubeen and colleagues studied the cellular basis of tooth regeneration in cichlid fish. Using an elegant tooth plunking strategy followed by single-nucleus RNA-sequencing, the authors were hoping to achieve an atlas of cellular and transcriptional changes that occur within and between cells during whole tooth replacement.

      Strengths:

      The major strengths of the methods and results are high novelty in the approach in a vertebrate with continuous tooth replacement, the temporal analysis of analyzing at plucking and three later time points, the thorough and sophisticated analysis of the snRNA-seq data, including the inference of trajectories and signaling events, and the robust signal of transcriptional differences induced by tooth plucking.

      Weaknesses:

      The major weaknesses of the methods and results are no validation of any of the inferred cell types, no functional tests of whether any of the changes in signaling pathways affect the plucking-induced tooth replacement process, and perhaps no clear takeaway message for biologists not necessarily interested in tooth replacement.

      Conclusion:

      The authors achieved their aims of identifying the changes in gene expression and cellular composition that occur during whole tooth replacement accelerated by plucking. Overall, the results support their conclusions, although some slight semantic qualifiers should probably be added (e.g., referring to "cell types" as "putative cell types").

      The work should have a high impact in the field of tooth and organ regeneration, and the novel methodological paradigm established here of accelerating tooth replacement three-fold by plucking has great promise for future follow-up studies to further study this process. The work could also have a strong impact through the computational methods used here to infer trajectories and signaling interactions. Specific pathways, genes, and cell types could be tested in other fish, such as zebrafish, to test function during tooth replacement.

      The work is unique and interdisciplinary, and also has significance by establishing that robust phenotypically plastic accelerations in regeneration rates occur upon tooth removal. There are very few studies like this one that combine genetic and environmental studies of regeneration. The result that three different species of cichlid fish that normally have very different tooth patterns all accelerate tooth replacement threefold upon tooth plucking also has significance in revealing a highly conserved plucking response.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting paper. The process of tooth exfoliation and replacement in vertebrates remains an intriguing and fascinating subject of inquiry. As the scientists noted, there are no mammalian models that can be used to examine signaling pathways in real time.

      Strengths:

      This work integrates in vivo and high-resolution transcriptomics. The study confirms previous findings and emphasizes the need for additional research into the processes that drive the restoration of missing teeth for future therapeutic uses.

      Weaknesses:

      I disagree with the use of the phrase "plucking". Instead, the authors use tooth extraction or tooth removal, which is clinically more correct for the procedure they are doing.

      The title is rather broad and appears to be more appropriate for a review than an original research work. I would advise specifying the species under research and/or the sort of damage model used in the transcriptome analysis.

      It's uncertain whether the findings are exclusively based on regeneration. The presence of tooth remnants, as well as unintended harm to surrounding tissues, may have triggered repair mechanisms, thereby biasing the current data. How did the authors handle this issue? The oral cavity was under severe manipulation, increasing the inflammatory stimuli, a situation that does not take place in physiological exfoliation.

      The authors indicated the use of microCT analysis; however, no such information appears in the main text. In fact, this manuscript lacks anatomical information. It is required to conduct histological examinations of the regenerated teeth at various time points.

      Although the current findings confirm previously found and verified signaling pathways, the absence of functional data lends uniqueness to this work.

    5. Author response:

      Many thanks to the three reviewers and the editors for their comments and review. These are fair, consistent (across positives and negatives), and largely expected comments. On behalf of my coauthors, I use this letter as a provisional response to indicate what we can and intend to change in a revised manuscript.

      (1) A major comment from all three referees is that our single-nucleus RNA-seq data should be validated. The reviewers differ in the detail of exactly what they think should be validated, but they refer, individually, to (1) the discovery of ‘cell types’ themselves, (2) pathways inferred from trajectory analysis, (3) differentially expressed genes in plucked vs control condition at four time points and/or (4) inferred ligand-receptor pairs from cell-cell communication analysis, across the same time course. 

      I think we’re actually on pretty good footing for 1-3, because of work we’ve published in the cichlid fish model.

      I tally that in references cited in the manuscript, and highlighted below (References 1, 10, 11, 29, 30, 31), we present 29 figures with 273 individual figure panels of histology, in situ hybridization and immunohistochemistry featuring genes expressed across stages of tooth development and replacement. These genes are markers of dental competency and regenerative potential.

      In addition, in multiple of these papers, we use pharmacology to manipulate the role of key pathways (Hh, BMP, Wnt, Notch) in cichlid tooth development and replacement. Identification and validation of cell types make use of these published data in cichlids (for markers matched to mouse), as well as an unbiased computational approach (SAMap) that draws homology between cichlid and mouse dental cell types, based on shared global patterns of gene expression.

      In short, experiments to validate cell types, gene expression and pathways active in cichlid teeth are published and referenced herein. I noticed that these references (some of which include Gareth Fraser as an author, when he was a postdoc in my group; for Reviewer 2) were cited in the Introduction and not the Rationale/Methods or Results section (such that reviewers may have missed them). We will be clearer about this in the revision. 

      We have not validated nor analyzed functionally the ligand-receptor pairs inferred from cell-cell communication analysis, across four times points of accelerated replacement. This work is beyond the scope of the current paper, and we will include a statement that these computational inferences represent hypotheses to be tested (although many of these ligand-receptor pairs have been noted in other ‘tooth’ publications that we cite).

      (2) The biggest weakness of our manuscript, noted by referees, is that we do not provide serial histology to accompany our snRNA-seq time course after plucking. We describe this as a limitation in the “Study limitations and future direction” section of the Discussion, but we can and will be stronger about why this is a weakness (e.g., we do not explicitly know for instance, the degree of damage done to tissue in the plucking paradigm). We do know that the jaw recovers quickly, but we do not know how different the plucked side is from the control side (which is also undergoing active replacement and remodeling). Uniting reviewer comments 1 and 2 here, the best future approach is a spatial transcriptomics reference at distinct stages of the plucking<>recovery paradigm, as we framed in the Discussion section, because this addresses simultaneously the state of dental/jaw tissue and the in situ expression of thousands of genes.

      (3) Reviewers asked about the presence of stromal cells in our snRNA-seq data. Because of this and another comment on the posted preprint version of our manuscript, we will take another look at the mesenchymal compartment of the snRNA-seq data and trajectories built from it.

      (4) Multiple (minor) suggestions for clarification in text and figures will be adopted. 

      Generally, I don’t think we’ll require reviewer re-engagement on the revision; editor review should be sufficient.

      References cited in the manuscript, highlighted here:

      (1) Fraser, G. J. et al. An Ancient Gene Network Is Co-opted for Teeth on Old and New Jaws. PLoS Biol. 7, e1000031 (2009).

      (10) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. Common developmental pathways link tooth shape to regeneration. Dev. Biol. 377, 399–414 (2013).

      (11) Bloomquist, R. F. et al. Developmental plasticity of epithelial stem cells in tooth and taste bud renewal. Proc. Natl. Acad. Sci. 116, 17858–17866 (2019).

      (29) Streelman, J. T., Webb, J. F., Albertson, R. C. & Kocher, T. D. The cusp of evolution and development: a model of cichlid tooth shape diversity. Evol. Dev. 5, 600–608 (2003).

      (30) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. A periodic pattern generator for dental diversity. BMC Biol. 6, 32 (2008).

      (31) Bloomquist, R. F. et al. Coevolutionary patterning of teeth and taste buds. Proc. Natl. Acad. Sci. 112, (2015).

    1. eLife Assessment

      This valuable study suggests that capsaicin nanoparticle administration in rats activates the transcription factor Nrf2 by directly binding to its repressor, KEAP1, leading to the induction of cytoprotective genes and preventing alcohol-induced gastric damage, offering a potential avenue for treating alcoholism-related gastric disorders. The authors provide solid evidence through a wealth of biochemical experiments in vitro, in cultured cells as well as in a rat model. The work will be of great interest to researchers studying oxidative damage in a variety of different diseases and the exploitation of molecules for therapeutic approaches.

    2. Reviewer #1 (Public review):

      The paper by Gao et al. describes the effect of capsaicin on the NRF2/KEAP1 pathway. The authors carried out a set of in vitro and in vivo experiments that addressed the mechanisms of the protective effect of capsaicin on ethanol-induced cytotoxicity.

      The authors conclude that capsaicin activates NRF2, which leads to the induction of cytoprotective genes, preventing oxidative damage. The paper shows that capsaicin may directly bind to KEAP1 and that it is a noncovalent modification of the Kelch domain.

      The authors also designed new albumin-coated capsaicin nanoparticles, which were tested for the therapeutic effect in vivo.

      Comments on latest version:

      The manuscript has been substantially improved. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The paper by Gao et al. describes that capsaicin (CAP) might act as a novel NRF2 agonist capable of suppressing ethanol (EtOH)-induced oxidative damage in the gastric mucosa by disrupting the KEAP1-NRF2 interaction. Initially the authors established and validated a cell model for EtOH-induced oxidative stress which they used to experiment with different CAP concentrations and to determine changes in a variety of parameters such as cell morphology, ROS production, status of redox balance to mitochondrial function, amongst others.

      The proposed mechanism by which CAP activates NRF2 and mitigates oxidative stress is thought to be via non-covalent binding to the Kelch-domain of KEAP1. A variety of assays such as BLI, CETSA, Pull-down, Co-IP, and HDX-MS were employed to investigate the KEAP1 binding behavior of CAP both in vitro and in GES1 cells. Consequently, the authors developed in vivo nanoparticles harboring CAP and tested those in a rat model. They found that pretreatment with the CAP-nanoparticles led to significant upregulation of NRF2 and subsequent effects on pro- (suppression of IL-1β, TNF-α, IL-6 and CXCL1) and anti-inflammatory (activation of IL-10) cyotkines pointing to a resolved state of inflammation and oxidative stress.

      Strengths:

      The work comprises a comprehensive approach with a variety of in vitro assays as well as cell culture experiments to investigate CAP binding behaviour to KEAP1. In addition, the authors employ in vivo validation by establishing an ethanol-induced acute gastric mucosal damage rat model and providing evidence of the potential therapeutic effect of CAP.

      The study further provides novel insights into the mode of CAP action by elucidating the mechanism by which CAP promotes NRF2 expression and downstream antioxidant target gene activation.

      The design of IR-Dye800 modified albumin-coated CAP nanoparticles for enhanced drug solubility and delivery efficiency demonstrates a valuable practical application of the research findings.

      In summary the study's findings suggest that CAP could be a safe and novel NRF2 agonist with implications for the development of lead drugs for oxidative stress-related diseases. Collectively, the data support the significance and potential impact of CAP as a therapeutic agent for oxidative stress-related conditions.

      Weaknesses:

      While the study provides valuable insights into the molecular mechanisms and in vivo effects of CAP, further clinical studies are needed to validate its efficacy and safety in human subjects. The study primarily focuses on the acute effects of CAP on ethanol-induced gastric mucosa damage. Long-term studies are necessary to assess the sustained therapeutic effects and potential side effects of CAP treatment.

      While the design of CAP nanoparticles is innovative, further research is needed to optimize the nanoparticle formulation for enhanced efficacy and targeted delivery to specific tissues.

      Addressing these weaknesses through additional research and clinical trials can strengthen the validity and applicability of CAP as a therapeutic agent for oxidative stress-related conditions.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Gao et al. describes the effect of capsaicin on the NRF2/KEAP1 pathway. The authors carried out a set of in vitro and in vivo experiments that addressed the mechanisms of the protective effect of capsaicin on ethanol-induced cytotoxicity.

      The authors conclude that capsaicin activates NRF2, which leads to the induction of cytoprotective genes, preventing oxidative damage. The paper shows that capsaicin may directly bind to KEAP1 and that it is a noncovalent modification of the Kelch domain.

      The authors also designed new albumin-coated capsaicin nanoparticles, which were tested for.

      I appreciate the authors' experimental efforts to strengthen the study's conclusions. However, in my opinion, the paper is still not fully technically sound, which weakens the strength of the evidence.

      Thank you very much for your constructive review. We are truly gratified by your recognition of our key findings—that capsaicin activates NRF2 by disrupting the KEAP1–NRF2 interaction, as conclusively demonstrated through multiple methods including Pull-down, Co-IP, CETSA, SPR, BLI, deuterium exchange MS, CETSA, MS simulations and other target gene expression assays, and that albumin-coated capsaicin nanoparticles exhibit therapeutic effects in vivo. Your technical suggestions were particularly valuable. In this revised version, We have carefully and thoroughly addressed the points raised by you and the other reviewer by providing additional data, including nuclear-cytoplasmic fractionation assays performed with an alternative NRF2 antibody to strengthen and clarify the supporting evidence. We believe this revision have significantly enhanced the overall quality and rigor of the manuscript. Regarding the limitation of the insufficient number of animals used in this article, we have also explained it in the main text. This is the revision we have made with our utmost efforts, and we hope it can meet your expectations to a certain extent.

      Reviewer #2 (Public review):

      Summary:

      In this paper the authors wanted to show that capsaicin can disrupt the interaction between Keap1 and Nrf2 by directly binding to Keap1 at an allosteric site. The resulting stabilization of Nrf2 would protect CAP-treated gastric cells from alcohol- induced redox stress and damage as well as inflammation (both in vitro and in vivo)

      Strengths:

      One major strength of the study is the use of multiple methods (CoIP, SPR, BLI, deuterium exchange MS, CETSA, MS simulations, target gene expression) that consistently show for the first time that capsaicin can disrupt the Nrf2/Keap1 interaction at an allosteric site and lead to stabilization and nuclear translocation of Nrf2.

      Moreover, efforts to show causal involvement of the Keap/Nrf2 axis for the made cellular observations as well as addressing potential off target effects of the polypharmacological CAP appreciated.

      One point that still hampers a bit of full appreciation of the capsaicin effect in cells is that capsaicin is not investigated alone, but mostly in combination with alcohol only.

      Moreover, the true add-on value of the developed nanoparticles remains obscure.

      The partly relatively high levels of NRF2 in putatively unstressed cells question the validity of used models.

      The rationale for switching between different CAP concentrations is unclear /not entirely convincing.

      The language and introduction could be improved.

      Overall, the authors are convinced that capsaicin (although weakly) can bind to Keap1 and releases Nrf2 from degradation, with relevance for biological settings. With this, the authors provide a significant finding with marked relevance for the redox/Nrf2 as well as natural products /hit discovery communities.

      Thank you very much for your positive assessment of our work and for the constructive suggestions to make it better. We also believe that capsaicin (CAP) offers new insights into the activation of NRF2. In this revision, we have addressed the shortcomings with the following efforts:

      (1) The inclusion of a capsaicin (CAP)-only treatment group—covering the same doses and time points as the ethanol co-treatment—revealed that CAP alone can directly inhibit the KEAP1–NRF2 interaction (Figure 3d,3e and Figure 4g), and promote the entry of NRF2 into the nucleus (Figure 2c), resulting in moderate NRF2 activation (Figure 3h and Figure 4d) after carefully revision. However, this effect was significantly enhanced in the presence of ethanol. We attribute the results to the ROS-enriched environment generated by ethanol. Given that KEAP1 is a sensor highly susceptible to oxidative modification, the combination of CAP's allosteric regulation and ethanol-induced oxidative stress promotes a more robust and persistent dissociation of the KEAP1–NRF2 complex. These findings align fully with the established model in which KEAP1–NRF2 dissociation is markedly facilitated under oxidative stress conditions.

      (2) From a translational and industrial perspective, nanoparticle formulations offer improved palatability compared with CAP itself; based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage. Moreover, ELISA results indicate that HSA@CAP nanoparticles exhibit enhanced anti-inflammatory activity compared with CAP alone (Figure 8d). In light of these findings, we prefer to retain this part of the data.

      (3) Your question is highly professional and well taken. In GES-1 (Fig. 1i) and UC-MSC (Fig. 1l), the expression of NRF2 was low in unstressed conditions, and the transcription and translation of its downstream targets indicate no functional activation, supporting the validity of our model. Accordingly, the control groups in some experiments were suboptimal. We repeated these experiments with additional biological replicates and used cells with early-passage; the discrepancies likely relate to high passage numbers and serum batch effects, but they do not affect our main conclusions. We have replaced the relevant data in the revised manuscript (Fig. 2c and Fig. 3h) and hope this addresses your concern.

      (4) In GES-1 cells, 8 μM consistently yielded the optimal effect, and we therefore maintained this concentration in other experiments in the same cells. And for other experiments, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM (Figure 3h), consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. Therefore, 8 μM were insufficient in Fig. 2g as we repeated many times. This likely reflects cell line–specific differences and the experimental context in 293T cells, including transfection and overexpression of NRF2 and Ub-K48-Myc, which necessitated a relatively higher CAP concentration.

      (5) Thank you very much for noting that the language and Introduction could be further improved. We have rechecked the manuscript for grammar and style and revised the Introduction with a more comprehensive, up-to-date description of the NRF2 pathway. The main changes include rewriting the third and forth paragraph of the Introduction, consolidating/removing irrelevant sections for greater clarity and concision. We hope these updates meet your expectations.

      Figure 2C: It is still not clear why naïve (unstressed /untreated cells) already show rather high nuclear abundance of Nrf2 (shouldn´t Nrf2 be continuously tagged for degradation by Keap1)

      Thank you for your constructive comments. In response to the concern raised, we repeated the nuclear–cytoplasmic fractionation experiments in early-passage GES‑1 cells and performed three independent replications using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The results showed low nuclear NRF2 levels under basal conditions, consistent with the KEAP1-mediated continuous degradation mechanism. Accordingly, we have updated the relevant figure in Figure 2C. Nevertheless, NRF2 could still be detected in the control group, which is basically consistent with the reported baseline levels of NRF2 observed in GES - 1 cells and other cell lines [1,2,3]. Therefore, this does not indicate the failure of model construction.

      References:

      (1) Wang, R. et al. Costunolide ameliorates MNNG-induced chronic atrophic gastritis through inhibiting oxidative stress and DNA damage via activation of Nrf2. Phytomedicine 130, 155581, doi:10.1016/j.phymed.2024.155581 (2024).

      (2) Li, Y. F. et al. Construction of Magnolol Nanoparticles for Alleviation of Ethanol-Induced Acute Gastric Injury. J Agric Food Chem 72, 7933-7942, doi:10.1021/acs.jafc.3c09902 (2024).

      (3) Li, M., Wang, J., Xu, Z., Lin, Y. & Dong, J. Atraric acid attenuates chronic intermittent hypoxia-induced brain injury via AMPK-mediated Nrf2 and FoxO3a antioxidant pathway activation. Phytomedicine 148, 157261, doi:10.1016/j.phymed.2025.157261 (2025).

      Figure 2G-H: Why switch to rather high concentrations?

      To validate ubiquitin-mediated degradation in Figure 2G-H, we needed to co-transfect multiple plasmids. Transfection efficiency was poor in GES-1 cells, so we switched to the commonly used HEK-293T cell line. In 293T cells, 2 and 8 μM were suboptimal, so we ultimately used 32 μM, consistent with other 293T experiments (Co-IP and Pull-down) that also used 32 μM. These choices reflect intrinsic cell line properties and protein overexpression in 293T, but do not affect our investigation of capsaicin’s mechanism.

      Figure 2I: in the pics of mitochondria the control mitochondria look way more punctuated (likely fissed) than the ones treated with EtOH or EtOH + CAP. Wouldn´t one expect that EtOH leads to mitochondrial fission and CAP can prevent it?

      Thank you very much for your comments. We re-acquired and analyzed mitochondrial morphology by the Leica STELLARIS 5 Confocal Microscope Platform, which our school didn't have at that time. The earlier wide-field fluorescence images lacked sufficient magnification and resolution, which obscured details and may have caused confusion. In the revised manuscript, we have replaced them with confocal images showing EtOH-induced mitochondrial abnormalities, whereas CAP treatment restored the reticular network, as expected. We also added a CAP-only group, which shows no discernible effect on mitochondrial morphology.

      Figure 3H: High basal Nrf2 levels in unstressed/untreated HEK WT cells, why?

      Thank you for raising this concern. We repeated the experiments in HEK-293T (WT) cells in better condition, and validated the results using an alternative, widely recognized NRF2 antibody (Cell Signaling Technology, Cat. No. 12721). The data consistently show relatively low NRF2 expression under basal conditions, in line with the KEAP1-mediated continuous degradation mechanism. We have corrected the corresponding figures accordingly.

      Figure 4a: Inclusion of an additional Keap1 binding protein (one with a ETGE motif) would have been desirable (to get information on specificity/risks of off-target (unwanted) effects of CAP).

      Thank you for this valuable suggestion. We have added CETSA experiments for DPP3, which contains an ETGE motif. The results show that endogenous DPP3 expression was low in GES-1 cells and does not bind CAP in vitro that BLI experiments indicated the KD was above 1 mM in Supplementary Figure 4h and 4i, and thus CAP does not thermally stabilize DPP3 at the cellular level. Therefore, the risk of off-target effects via binding to ETGE-containing proteins like DPP3 appears minimal.

      Figure 4D: Why is there no stabilization of Nrf2 by CAP in lane 2?

      Thank you for raising this concern. We repeated the experiment in GES‑1 cells and performed three independent replicates using an alternative, widely recognized Nrf2 antibody (Cell Signaling Technology, Cat. No. 12721). The data show that CAP alone increases NRF2 expression to some extent. We have updated the corresponding figures accordingly in Figure 4D.

      Figure 4f: 5% DMSO is a rather high solvent concentration, why so high (the solvent alone seems to have quite marked effects!)

      Thank you for raising this concern. Our original figure legend was misleading and has been corrected. Only the highest CAP concentration (500 μM) contained 5% DMSO as the vehicle; the other CAP concentrations, prepared by serial dilution in complete medium, did not contain such high solvent levels (e.g., 65.5 μM CAP contained 0.625% DMSO). This experiment included transient overexpression of NRF2-HA as purified recombinant NRF2 protein is prohibitively expensive, 10 ug needs about 900 GBP from Abcam. We therefore conducted a preliminary assay by incubating purified Kelch-Flag protein with cell lysates overexpressing NRF2-HA and measured NRF2 levels in the supernatant and pellet in Figure 4f. Nevertheless, the conclusion that CAP disrupts the NRF2–KEAP1 interaction is better supported by SPR (Figure 3d), Co-IP (Figure 3e) and Pull-down (Figure 4g).

      Figure 6/7: not expert enough to judge formulations and histology scores. However, the benefit of the encapsulated capsaicin does not become entirely clear to me, as CAP and IRHSA@CAP mostly do not significantly differ in their elicited response.

      Thank you very much for the valuable suggestion. Although histopathology suggests only modest differences between the two treatments, the nanoparticle group showed markedly lower inflammatory cytokine levels than pure CAP: IL-1β, IL-6, TNF-α, and CXCL-1 were significantly reduced, while the anti-inflammatory cytokine IL-10 was significantly increased (Figure 8d). These changes are important for maintaining a healthy gastric environment and may better support digestive function in vivo. Accordingly, from a translational and industrial perspective, nanoparticle formulations also offer improved palatability compared with capsaicin itself. Based on firsthand experience, the nano formulation is more tolerable than CAP. When preparing pure CAP, the pungency often causes irritation, whereas HSA@CAP nanoparticles are milder and demonstrate superior safety in mice following oral gavage.

      Figure 7: Rebamipide was introduced as positive control in the text with an activating effect on Nrf2, but there is no induction of hmox and nqo in Figure 7f, why? It does not look as the positive control was wisely chosen.

      Thank you for your insightful comment. We agree that this result was suboptimal and sincerely apologize for the oversight. We are currently facing significant constraints: the original cDNA is depleted, and funding cuts have severely limited our resources for reagents and animal studies. A full repetition of the rat experiment at the original scale and quality is not feasible in the short term. To ensure the scientific rigor of the paper, we have made the difficult decision to remove Figure 7f. We believe this is necessary to base our conclusions on the most robust evidence. We apologize for any inconvenience and hope this solution is acceptable. We have revised the manuscript accordingly and appreciate your understanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors did not provide data validating the NRF2 antibody for in vitro studies, particularly for IF data where there is no molecular mass indication for NRF2. The IF data suggest that NRF2 is primarily located in the cytoplasm under control conditions (Fig. 2A), whereas the WB data show a strong band in the nucleus (Fig. 2C). What is the reason for this inconsistency?

      We sincerely appreciate your valuable comments. Previously, we used an NRF2 antibody (Cat. No. 16396-1-AP, Proteintech); the vendor’s data show that shRNA knockdown in HeLa cells markedly reduces NRF2 at the expected molecular weight and IF data in HepG2 cells show a trace amount of cytoplasmic localization in controls and clear nuclear translocation after MG-132 treatment, which indicates that this antibody can be used for immunofluorescence (IF) to indicate the subcellular localization of NRF2, and our experimental results are also in line with expectations in Figure 2A. In addition, to address the reviewer's concern, we purchased another NRF2 antibody (Cat. No. 12721, Cell Signaling Technology), which was also highly validated. In this version, we repeated nuclear-cytoplasmic fractionation experiments and other important experiments using this antibody. Together, these data confirm the low basal level of NRF2 in the absence of stress and also show that CAP could improve the expression of NRF2. We have corrected the Figure 2C so that the WB and IF results are now consistent. We wish to reiterate our deep appreciation for the professionalism and rigor of your review.

      (2) Additionally, I could not find Supplementary Figure 4F-I, which concerns TRPV1. These figures are mentioned in the response to reviewers but are missing from the manuscript-please double-check.

      The supplementary figures were initially submitted as a compressed archive. We recognize that there might have been an issue with the transfer of this file to the reviewers. As shown in Supplement Figure 4f to Supplement Figure 4i, we further explored the TRPV1 and DPP3 to detect the potential off-target effects of CAP respectively. Capsazepine (CAPZ), which is TRPV1 receptor antagonist did not affect the protection of CAP against GES-1 (Fig S4f and S4g), which may indicate that CAP activation of NRF2 does not have to depend on TRPV1. The binding of CAP with DPP3, containing an ETGE motif and can bind to KEPA1, was detected by BLI, and we found that the KD between CAP and DPP3 was 1.653 mM(>100 μM), which may indicate the potential off-target effect of CAP is low because CAP had a relatively strong binding force with KEAP1 about 31.45 μM (Fig S4h and S4i).

      (3) I am also somewhat unconvinced by the data obtained from NRF2 KO mice. First, it appears that some NRF2 KO mice respond to CAP treatment well while others do not, resulting in a high standard deviation. To strengthen the conclusions, it would be advisable to use a larger number of animals to confirm or exclude the effect. This is precisely why I still believe that three rats per group are insufficient for the in vivo studies. Please emphasize in the manuscript that a limitation of this study is the use of only three rats per group for the in vivo experiments.

      Thank you very much for your question and suggestions. As for the rat experiments in Figure 7 and Figure 8, there are many other references available, as noted in the introduction: “Recent experiments conducted in rats have demonstrated that red pepper/capsaicin (CAP) possesses significant protective effects on ethanol-induced gastric mucosal damage , and the mechanisms involved may relate to the promotion of vasodilation[6,7], increased mucus secretion[8] and the release of calcitonin gene-related peptide (CGRP)[9,10]. However, it is important to note that the specific role of the antioxidant activity of CAP has not been thoroughly investigated.” Therefore, we conducted extensive literature research and preliminary experiments to ensure that our formal experiment with 8 groups could yield relatively stable results. Of course, we admit that using more rats in vivo would make the conclusion more reliable. Unfortunately, the project was delayed due to funding issues. We are currently facing significant resource constraints: reductions in research funding from the National Natural Science Foundation have severely limited our funding for reagents and animal experiments in this study. As a result, it has become impossible to fully repeat all animal experiments at the original scale and quality in the short term. Regrettably, to supplement the NRF2 knockout animal-related experiments (n=6), we have already spent approximately 70,000 RMB (about 10,000 USD). We have made tremendous efforts to ensure the scientific rigor of the paper. We sincerely apologize for any inconvenience caused. At the same time, we fully recognize the importance of increasing the sample size in animal experiments for this study. We have explicitly acknowledged this as a limitation of our work in the Discussion Section and have revised the manuscript accordingly. We greatly appreciate your understanding.

      (4) Furthermore, please double-check the blot in Fig. 9D. Tubulin and P-p65 bands appear very similar, and tubulin disappears in response to EtOH and EtOH/CAP treatment in KO mice. Is it the case? I am not sure the quantitative data reflect the WB bands. Please verify that.

      We sincerely appreciate your valuable feedback on our manuscript. Indeed, we may have included bands that do not meet the requirements due to our eagerness, and we are very grateful for your pointing this out; it was indeed a significant oversight on our part. I will definitely pay more attention to careful checking in the future. In response to this, we have re-conducted the experiments using the preserved tissue samples and have accordingly updated Figure 9d. Thank you for your insightful suggestions.

      Reviewer #2 (Recommendations for the authors):

      Presentation:

      The data with the encapsulated CAP appear a little as side arm that does not bolster your main message (maybe take out and elaborate on this topic more extensively in another manuscript)

      We sincerely thank the reviewer for this suggestion. However, based on the ELISA results demonstrating that nano-capsaicin exerts a significantly stronger anti-inflammatory effect than pure capsaicin (CAP), and considering its superior sensory profile for industrial applications (confirmed by our sensory evaluations), we believe these data provide valuable insights. Therefore, we would prefer to retain this section in the manuscript and hope for your understanding.

      Revise the introduction on the Nrf2 signaling pathway ...as it is written at the moment, someone outside the Nrf2 field might have trouble to understand

      Thank you for the valuable suggestion again. We have rewritten the introduction to the NRF2 signaling pathway to improve accessibility for readers outside the field.

      “The Kelch-like ECH-associated protein 1 (KEAP1)–Nuclear factor erythroid 2–related factor 2 (NRF2)–antioxidant response element (ARE) pathway is a core defense mechanism against oxidative and electrophilic stress[11]. Under homeostatic conditions, KEAP1 acts as a linker protein for the Cul3-E3 ubiquitin ligase complex, continuously promoting the ubiquitination and proteasomal degradation of NRF2, thereby maintaining NRF2 at basal levels[12]. When oxidative or electrophilic stress occurs, critical cysteine residues in KEAP1 are modified, or the interaction between the ETGE/DLG motifs on NRF2 and the Kelch domain of KEAP1 is disrupted, allowing NRF2 to escape degradation, accumulate, and translocate to the nucleus. There, NRF2 forms heterodimers with small Maf proteins and binds to ARE, inducing the expression of antioxidant and cytoprotective genes such as those involved in glutathione metabolism, NADPH regeneration, phase II detoxifying enzymes, and drug efflux transporters, thereby restoring redox balance within the cell and reducing oxidative damage[13].

      Classical NRF2 agonists, such as sulforaphane, are small molecules that bind to KEAP1 and covalently modify its cysteine residues, thereby altering the binding affinity between KEAP1 and NRF2 [14]. However, traditional covalent agonists may induce sustained overactivation of NRF2, leading to adverse side effects and limiting clinical application [15]. Consequently, recent efforts have shifted toward the development of non-covalent NRF2 agonists, which are generally associated with lower toxicity and greater translational potential, enabling more controlled enhancement of NRF2 activity and offering new insights and therapeutic opportunities in antioxidant-related interventions.”

      The authors should check and review extensively for improvements to the use of English to get rid of awkward phrases /wording.

      Thank you very much for this helpful comment. We sincerely appreciate the suggestion and have carefully re‑read and further polished the entire manuscript to remove awkward phrasing and improve the readability of expressions and phrases. We hope these revisions address your concern, and we remain grateful for your guidance.

    1. eLife Assessment

      This important paper presents the discovery of the molecular basis of differential apterous expression during early Drosophila wing disc development. The evidence supporting these conclusions is compelling, ranging from classical genetic approaches to state-of-the-art genetic engineering techniques. By opening new questions, this paper is expected to be of broad interest to developmental biologists and geneticists working on transcriptional regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously known and others suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitutes a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) has been thoroughly and satisfactorily addressed by the authors.

    3. Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-art genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, the authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      Comments on revised version:

      The authors have adequately addressed my major concerns.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

      We thank the reviewer for the positive assessment of our manuscript as well as for the many constructive comments during its revision.

      Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cisregulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use stateof-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      We thank the reviewer for these positive comments.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      We thank the reviewer for this assessment. We have carefully revised the manuscript to improve clarity, readability, and consistency throughout. Specifically:

      (1) Streamlined several sections to improve narrative flow. Specially in the abstract, model and dCas9 sections.

      (2) Corrected grammatical issues across the manuscript. As the reviewer pointed out, we found many in the text. We are grateful the reviewer was insistent in this point.

      (3) harmonized formatting and terminology. Many small inconsistencies were found in the figure legends, that have been largely adapted.

      We believe these changes substantially improve the accessibility and overall presentation of the work. However, we have not shortened the manuscript, as we want to transmit the complexity of attempting to dissect non-coding regions, as well as not oversimplify the phenotypes obtained.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      We appreciate this important point. We agree that our study does not fully resolve the upstream mechanisms that ultimately position the AP and DV boundaries. Our goal was instead to determine how the ap early enhancer (apE) contributes to the correct spatial relationship between these boundaries. To address the reviewer’s concern, we have revised the Introduction and Discussion to soften the framing of this question and to more clearly state the scope of our conclusions. We now emphasize that our work provides mechanistic insight into how apE function impacts DV/AP boundary organization, rather than claiming to fully resolve the upstream positioning mechanism.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      We thank the reviewer for this thoughtful evaluation. We agree that the genetic evidence for Grain (Grn) is stronger and more direct than for Antennapedia (Antp). In response, we have revised the manuscript to more carefully calibrate the strength of our conclusions regarding Antp.

      Specifically, we have:

      Softened the language throughout to describe Antp as a candidate HOX input,

      Explicitly stated that direct binding to the m3 site remains to be demonstrated biochemically, and

      Clarified in the Discussion that our data support an early contributory role for Antp rather than establishing it as the definitive HOX factor acting at apE.

      We believe retaining the Antp data is important because:

      (1) The m3 site shows strong HOX dependency in vivo,

      (2) Early Antp depletion produces clear defects in ap expression, and

      (3) Recent literature supports an early requirement for Antp in wing development.

      Together, these observations provide a coherent working model while appropriately acknowledging current limitations. We hope the reviewer agrees that the revised framing now appropriately reflects the strength of the evidence.

      (3) I am not sure whether the term hemizygous is used properly

      We use the term hemizygous as in classical genetics, in which an individual carrying an allele opposite a chromosomal deletion is considered hemizygous at that locus (see for example the entry for ap<sup>4</sup> mutant in the red book (Lindsley and Zimm, The Genome of Drosophila melanogaster):

      “… ap4 /Df(2L) M4IA-54 hemizygote has nearly normal complement of bristles but otherwise resembles ap4 homozygote (Butterworth and King, 1965).”

    1. eLife Assessment

      This important work provides a new method to extract cfDNA from residual plasma from heparin separators for molecular testing. The evidence supporting the authors' claims is convincing, although some further metrics should also be evaluated. This finding will be interesting to people working in epigenomics and infectious disease diagnostics.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, and likely to benefit the research community.

      Comments on previous revisions:

      The concerns raised have been addressed. The heparin separator-based cfDNA method described in this study is likely to benefit the research community. I have no further scientific concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Adapting Clinical Chemistry Plasma as a Source for Liquid Biopsies" addresses a timely and practical question: whether residual plasma from heparin separator tubes can serve as a source of cfDNA for molecular profiling. This idea is attractive, since such samples are routinely generated in clinical chemistry labs and would represent a vast and accessible resource for liquid biopsy applications. The preliminary results are encouraging, and likely to benefit the research community.

      Comments on revisions:

      The concerns raised have been addressed. The heparin separator-based cfDNA method described in this study is likely to benefit the research community. I have no further scientific concerns.

      We appreciate the encouragement and recognition.

      Reviewer #2 (Public review):

      Summary:

      The authors propose that leftover heparin plasma can serve as a source for cfDNA extraction, which could then be used for downstream genomic analyses such as methylation profiling, CNV detection, metagenomics, and fragmentomics. While the study is potentially of interest, several major limitations reduce its impact; for example, the study does not adequately address key methodological concerns, particularly cfDNA degradation, sequencing depth limitations, statistical rigor, and the breadth of relevant applications.

      Strengths:

      The paper provides a cheap method to extract cfDNA, which has broad application if the method is solid.

      Weaknesses:

      (1) The introduction lacks a sufficient review of prior work. The authors do not adequately summarize existing studies on cfDNA extraction, particularly those comparing heparin plasma and EDTA plasma. This omission weakens the rationale for their study and overlooks important context.

      (2) The evaluation of cfDNA degradation from heparin plasma is incomplete. The authors did not compare cfDNA integrity with that extracted from EDTA plasma under realistic sample handling conditions. Their analysis (lines 90-93) focuses only on immediate extraction, which is not representative of clinical workflows where delays are common. This is in direct conflict with findings from Barra et al. (2025, LabMed), who showed that cfDNA from heparin plasma is substantially more degraded than that from EDTA plasma. A systematic comparison of cfDNA yields and fragment sizes under delayed extraction conditions would be necessary to validate the feasibility of their proposed approach.

      (3) The comparison of methylation profiles suffers from the same limitation. The authors do not account for cfDNA degradation and the resulting reduced input material, which in turn affects sequencing depth and data quality. As shown by Barra et al., quantifying cfDNA yield and displaying these data in a figure would strengthen the analysis. Moreover, the statistical method applied is inappropriate: the authors use Pearson correlation when Spearman correlation would be more robust to outliers and thus more suitable for methylation and other genomic comparisons.

      (4) The CNV analysis also raises concerns. With low-coverage WGS (~5X) from heparin-derived cfDNA, only large CNVs (>100 kb) are reliably detectable. The authors used a 500 kb bin size for CNV calling, but they did not acknowledge this as a limitation. Evaluating CNV detection at multiple bin sizes (e.g., 1 kb, 10 kb, 50 kb, 100 kb, 250 kb) would provide a more complete picture. In addition, Figure 3 presents CNV results from only one sample, which risks bias. Similar bias would exist for illustrations of CNVs from other samples in the supplementary figures provided by the authors. Again, Spearman correlation should be applied in Figure 3c, where clear outliers are visible.

      (5) It is important to point out that depth-based CNV calling is just one of the CNV calling methods. Other CNV calling software using SNVs, pair-reads, split-reads, and coverage depth for calling CNV, such as the software Conserting, would be severely affected by the low-quality WGS data. The authors need to evaluate at least two different software with specific algorithms for CNV calling based on current WGS data.

      (6) The authors omit an important application of cfDNA: somatic mutation detection. Degraded cfDNA and reduced sequencing depth could substantially impact SNV calling accuracy in terms of both recall and precision. Assessing this aspect with their current dataset would provide a more comprehensive evaluation of heparin plasma-derived cfDNA for genomic analyses.

      Comments on revisions:

      As suggested previously, the Pearson correlation analysis tends to be overstated; please replace it with Spearman correlation in the whole manuscript. Currently, the authors include both of them in the abstract, method, results, and graphics, all of which are required to be updated to only use Spearman correlation results.

      I don't have other concerns about the manuscript.

      We entirely agree and have removed all instances of Pearson correlation from the paper, including the abstract, method, results, and graphics. Only the Spearman’s correlation was used.

      We appreciate your efforts and helpful comments.

    1. eLife Assessment

      This study provides a valuable contribution to understanding the functional and molecular organization of the medial nucleus accumbens shell in feeding behavior. Through a multimodal approach that integrates in vivo imaging, optogenetic manipulation, and genetic strategies, the authors present convincing evidence for rostro-caudal differences in D1-SPN activity, advancing and refining earlier pharmacological frameworks. The discovery of Stard5 and Peg10 as regionally informative markers, together with the introduction of a Stard5-Flp driver line, establishes a foundation for more targeted circuit dissection. While an expanded characterization of other Stard5-positive cell populations (e.g., D2-SPNs, interneurons) would strengthen the work, the experimental rigor and internal consistency of the findings are clear. Overall, this is a technically strong and conceptually meaningful study with broad relevance for those investigating neural mechanisms of reward, affect, and feeding.

    2. Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1-positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implication in physiological and pathological feeding behavior. In the revision, the authors provided additional data that strengthen the specificity of their behavioral effects. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings and proposes relevant future directions.

      Weaknesses:

      Identification and characterization of the activity of Stard5-positive neurons will require further characterization as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to determine whether their manipulation would result in comparable behavioral outcomes.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines how different parts of the brain's reward system regulate eating behavior. The authors focus on the medial shell of the nucleus accumbens, a region known to influence pleasure and motivation. They find that nerve cells in the front (rostral) portion of this region are inhibited during eating, and when artificially activated, they reduce food intake. In contrast, similar cells at the back (caudal) are excited during eating but do not suppress feeding. The team also identifies a molecular marker, Stard5, that selectively labels the rostral hotspot and enables new genetic tools to study it. These findings clarify how specific circuits in the brain control hedonic feeding, providing new entry points to understand and potentially treat conditions such as overeating and obesity.

      We thank Reviewer 1 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      (1) Conceptual advance: The work convincingly establishes a rostro-caudal gradient within the medNAcSh, clarifying earlier pharmacological studies with modern circuit-level and genetic approaches.

      (2) Methodological rigor: The combination of fiber photometry, optogenetics, CRISPR-Cas9 genetic engineering, histology, FISH, scRNA-seq, and novel mouse genetics adds robustness, with complementary approaches converging on the central claim.

      (3) Innovation: The generation of a Stard5-Flp line is a valuable resource that will enable precise interrogation of the rostral hotspot in future studies.

      (4) Specificity of findings: The dissociation between appetitive and aversive conditions strengthens the interpretation that the observed gradient is restricted to feeding.

      We thank Reviewer #1 for their supportive feedback.

      Weaknesses and points for clarification

      (1) Role of D2-SPNs: Since D1 and D2 pathways often show opposing roles in feeding, testing, or discussing D2-SPN contributions would provide an important control and context. Since the claim is that Stard5 is expressed in both D1- and D2MSNs, it seems to contradict the exclusive role of D1R MSNs in authorizing food intake.

      We agree that D2-SPNs represent an important and relevant cell population in the context of our study. The Stard5-Flp line labels a mixed population of D1- and D2-SPNs, and we agree that dissecting the distinct contributions of Stard5<sup>+</sup> D1-SPNs and Stard5⁺ D2-SPNs to feeding behavior would be both interesting and informative.

      Although we understand the point raised by the Reviewer, we do not entirely agree that the expression of Stard5 in both D1- and D2-SPNs contradicts the established role of D1-SPNs in authorizing food intake. In the medNAcSh, D1- and D2-SPNs do not exert opposing functions. D2-SPNs project densely to the ventral pallidum and more sparsely to the lateral hypothalamus and, like D1-SPNs, are predominantly rewardinhibited at the population level (Domingues et al. 2025; Pedersen et al. 2022).

      We added the following in the discussion: “Additionally, a new study showed that manipulation of D2-SPN cell bodies in the medNAcSh modulates reward preference, self-stimulation, and palatable food intake in a frequency- and context-dependent manner (Requejo-Mendoza et al., 2025). Together, these findings suggest that D1- and D2-SPNs within the medNAcSh play complementary rather than opposing roles in reward processing. Hence, the potential role of rostral and caudal medNAcSh D1- and D2-SPNs in foodrelated behaviors beyond the act of consumption could be addressed in future work.” We also acknowledge that not investigating rostro-caudal gradients of D2-SPN in reward and aversion processing “represents a limitation of this work”.

      We fully agree that disentangling the specific contributions of Stard5<sup>+</sup> D1- and Stard5<sup>+</sup> D2-SPNs is an important next step. We have now crossed the Stard5-Flp line with Drd1-Cre and A2a-Cre lines. In a pilot experiment (not shown), we injected Flp+,Cre+, Flp+,Cre- and Flp-,Cre+ mice with 4 different FlpOn-CreOn AAVs to determine if any of these AAVs demonstrate specific expression. However, all AAVs exhibited moderate to strong leaky expression of the Cre, preventing reliable cell-type-specific targeting. This was not seen with Flp-only or Cre-only AAVs. The leakiness mentioned is a known challenge of FlpOn-CreOn AAVs and requires additional troubleshooting (e.g. reduce the titer). As this proved to be more challenging than anticipated, this work is ongoing and will be addressed in a future study rather than in the present revisions.

      (2) Behavioral analyses:

      (a) In Figure 2, group differences in consumption appear uneven; additional analyses (e.g., lick counts across blocks and session totals) would strengthen interpretation.

      The group differences in consumption that appear uneven likely reflect an overall lower total lick counts per session in the Control group. We have now added analyses on average lick counts per block and session totals in the newly included Supplementary Figure S7, which support the results shown in Figure 2.

      Although we observe a difference in total lick count across the entire session between Control and Rostral ChrimsonR mice (Supplementary Figure S7d), we deem the comparison in total session lick counts not that informative here. Instead, we would argue that the laser-on epoch is the most meaningful comparison. During this period, optogenetic activation had no effect on licking behavior in control mice, showed a nonsignificant trend toward reduced consumption in caudal ChrimsonR mice, and produced a significant reduction in lick counts when rostral medNAcSh D1-SPNs were activated (Figure 2g-i and Supplementary Figure S7c).

      We added in the discussion the following explanation:

      “In addition, comparison of licking behavior during the laser-off blocks revealed an interesting effect: following cessation of opto-stimulation, Rostral ChrimsonR mice licked more than Caudal ChrimsonR and Control mice, suggesting a possible compensatory overconsumption. One possible interpretation is that the optogenetic parameters used suppressed consummatory behavior without reducing the motivation to obtain the reward. Furthermore, consistent with the RTPPA results, activation of rostral D1-SPNs may be experienced as aversive and termination of the optogenetic stimulation could produce relief, which in turn reinforces the licking behavior. Further investigations are required to test these possibilities.”

      (b) The design and contribution of aversive assays to the main conclusions remain somewhat unclear and could be better justified.

      We appreciate the Reviewer’s comment regarding the design and contribution of the aversive assays. The rationale for including these experiments was to determine whether the rostro–caudal functional segregation observed for reward-related feeding also applies to aversive processing.

      First, using foot shock, we tested whether D1-SPNs in the rostral versus caudal medNAcSh respond differently to an aversive stimulus. In contrast to reward-related responses, both populations responded similarly, exhibiting excitation. Second, to ensure that this effect was not specific to a single stressor, we tested a second aversive stimulus (tail lift) and again observed comparable excitatory responses in rostral and caudal D1-SPNs. Third, we assessed whether optogenetic activation of these neurons is perceived as rewarding or aversive. Using a real-time place preference/aversion assay, we found that optogenetic stimulation of D1-SPNs in both subregions induced place aversion.

      Together, these experiments show that while D1-SPNs display region-specific effects on reward-related feeding behavior, their activity responses to aversive stimuli and the avoidance response to optogenetic activation are similar across rostral and caudal medNAcSh. This contrast strengthens our conclusion that the D1-SPN rostro-caudal gradient is specific to appetitive contexts.

      We added the following in the discussion:

      “Here, we further tested the existence of rostro-caudal gradients for aversion, asking whether D1-SPNs in the rostral vs. caudal medNAcSh respond differently to aversive stimuli. To ensure that any observed effects were not specific to a single stressor, we tested two distinct aversive stimuli (foot shock and tail lift). In both cases, we found no rostro-caudal differences, as D1-SPNs in both subregions responded with excitation. We also asked whether optogenetic activation of these neurons is perceived as aversive. Stimulation of D1- SPNs in both rostral and caudal medNAcSh promoted aversive behavioral responses in the RTPPA experiment. Hence, in contrast to the pharmacological inhibitions mentioned above, we did not detect differences in aversive behaviors according to the rostro-caudal medNAcSh site.”

      (c) The scope of behavior is mainly limited to consumption; testing related domains (motivation, reward valuation, and extinction) could broaden the significance.

      We thank the Reviewer for the suggestion to examine additional behavioral domains such as motivation, reward valuation, and extinction. We focused our efforts on consumption given the large body of literature demonstrating a very important role of the medNAcSh in reward consumption. However, we fully agree that feeding encompasses multiple phases, from appetitive and goal-directed behaviors to consummatory behavior, and that the NAc in general, and to some extent the NAcSh is involved in behaviors across this spectrum. For instance, prior work has shown that the medNAcSh is involved in reward preference and that this follows a rostro-caudal gradient (e.g. Pedersen et al. 2022).

      While it would be informative to directly test motivational processes using operant paradigms (e.g., nosepoke or lever-press tasks), our current experimental setup did not allow for these assays. Instead, we performed exploratory experiments manipulating the animals’ internal state with food deprivation. As expected, under food deprivation, total licking increased robustly in control mCherry and Rostral ChrimsonR medNAcSh mice as compared to ad libitum feeding (25 min session with 5 alternating on-off blocks: ad libitum Control = 692 and Rostral ChrimsonR= 1280 average total licks per session, see Figure 2g-h and Supplementary Figure S7d; food deprived Control =2428 and Rostral ChrimsonR =2390 total licks averaged for N=9 Control, N= 12 Rostral). Moreover, similar to ad libitum feeding, optogenetic activation of rostral D1-SPNs suppressed licking in food-deprived mice , albeit to a lesser extent than under ad libitum feeding conditions (Figure 2).

      These preliminary observations suggest that internal state modulates the role of rostral D1-SPNs in reward consumption, potentially reflecting an interaction between homeostatic and hedonic feeding circuits. However, as this line of investigation was exploratory and not pursued further in the present study, these data are not included in the main manuscript.

      Author response image 1.

      In vivo optogenetic stimulation of rostral medNAcSh inhibits reward consumption to a lesser extent after overnight food deprivation. a. Quantification of the average lick count per 5 min block in mCherry control mice vs. ChrimsonR (rostral) mice, showing a lower lick count in rostral medNAcSh ChrimsonR mice during the opto-stimulation epoch. Blocks of 5 min with or without opto-stimulation were alternated (on/off/on/off/on) for a total of 5 blocks. b. Quantification of mean lick counts in the opto-stimulation vs. non-opto-stimulation epochs shows a significant decrease in lick counts following stimulation of rostral medNAcSh D1-SPNs and no significant difference in the control mice. 2-way RM-ANOVA (group x epoch). Main effects: epoch F (1, 28) = 6.027, p=0.0206; group F (2, 28) = 1.448, p=0.2520; group x epoch F (2, 28) = 8.123, p=0.0017. Sidak post-hoc opto-stimulation vs. non opto-stimulation: Control on vs. off t(28) = 1.856, p=0.2061; Rostral medNAcSh on vs. off t(28) = 3.054, p= 0.0147. N=9 for Control mCherry; N=12 for Rostral medNAcSh ChrimsonR. c. Pie charts showing % of mice showing food intake inhibition (mean Δlick counts non-opto/opto>0) in each group: 42% of ChrimsonR rostral medNAcSh mice, 20% of controls. Data is mean ± SEM. *p<0.05; **p<0.01; ***p<0.001.

      (3) Molecular profiling:

      (a) Stard5 expression is present in both D1- and D2-SPNs; comparisons to bulk calcium signals and quantification of percentages across rostral and caudal cells would be helpful. The authors should establish whether these cells also express SerpinB2, an established marker of LH projecting neurons.

      We thank the Reviewer for this relevant point. In the photometry experiments (Figure 7) using Stard5-Flp mice, we acknowledge that the recorded signals reflect a mixed population of D1- and D2-SPNs. Based on quantification in a separate set of brains, we estimate that Stard5 is expressed in a variety of cell types, of which 35% are D1-SPNs and 30% are D2-SPNs (Supplementary Figure S3). While Liu et al. 2024 reported no overlap between Stard5 and Drd2, canonical marker for D2-SPNs, available transcriptomic data (Chen et al. 2021) and our own histological and RNA-based analyses (Figure 6 and Supplementary Figure S3) found Stard5 to be expressed in both D1-SPNs and D2-SPNs. Hence, indeed, Stard5 is a mixed population.

      We provide here the quantification of percentages of Stard5 expression across rostral and caudal cells: for instance, in the dorsal rostral medNAcSh, 79% of D1-SPNs and 76% of D2-SPNs express Stard5; in the ventral rostral medNAcSh the percentages are 47% and 55%, whereas the same percentages drop to 39 and 31% in the dorsal caudal medNAcSh and 15% and 20% in the ventral caudal medNAcSh.

      As suggested by the Reviewer, we also performed further analysis of the publicly available scRNA-seq dataset from Chen et al. 2021, which shows that 4.4% of all Stard5-expressing cells are also Serpinb2+, while 1.8% of all sequenced NAc cells are Stard5+/Drd1+/Serpinb2+ and 0.21% are Stard5+/Drd2+/Serpinb2+.

      (b) Verification of the Stard5-2A-Flp line (specificity, overlap with immunomarkers) should be documented more thoroughly.

      We agree with the Reviewer that a more detailed characterization of the Stard5-2A-Flp mouse line would be relevant for the validation of the line.

      In our study, we identified Stard5 as a marker gene that enables selective targeting of the rostral medNAcSh, as it is strongly enriched in the rostral medNAcSh (Figure 5-7). Stard5-Flp mice injected with Flp-dependent AAV in rostral medNAcSh, NAc core and dorsal striatum show specific AAV expression only in the rostral medNAcSh (Figure 7).

      Moreover, we show that the line is specific as injection of a Flp-dependent AAV in a Stard5-Flp negative line does not lead to expression (Figure 7c).

      However, re-analysis of the published scRNA-seq dataset (Chen et al. 2021) indicates that Stard5<sup>+</sup> cells comprise a heterogeneous population, including D1-SPNs (~35%), D2-SPNs (~30%), local interneurons (~18%), glial cells (~12%), and other cell types (Suppl. Fig. S3).

      Together, these data validate the Stard5-2A-Flp line as a spatially specific genetic entry point for the rostral medNAcSh, while highlighting the cellular heterogeneity of Stard5-expressing cells. Given the limited brain material left, we were not able to add additional colocalization analyses with immunomarkers, but agree this would be important to include in future studies.

      (c) The molecular analysis is restricted to a small set of genes; broader spatial transcriptomics could uncover additional candidate markers. See also above.

      We thank the Reviewer for this suggestion. Broader spatial transcriptomic analyses would indeed be highly valuable for identifying additional candidate markers. Our aim for the present study was to identify molecular landmarks to selectively target the rostral medNAcSh, but in a future study, we would be highly interested in building on our initial findings and providing an exhaustive molecular characterization of the region using spatial transcriptomics. We would be particularly motivated to do so, given the important functional specificity of the rostral NAcSh identified in the present publication.

      Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient, while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      We thank Reviewer 2 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implications in physiological and pathological feeding behavior. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings, and proposes relevant future directions

      We thank Reviewer #2 for their supportive feedback.

      Weaknesses:

      At this stage, identification and characterization of the activity of Stard5-positive neurons is a bit disconnected from the rest of the paper, as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to be determined whether their manipulation would result in comparable behavioral outcomes.

      We agree that this represents an important limitation of the current study. In our search for molecular markers of the rostral feeding hotspot, we identified Stard5 as a marker enriched in the rostral medNAcSh; however, Stard5 labels a heterogeneous population that includes D1- and D2-SPNs as well as other cell types. While Stard5<sup>+</sup> neurons display activity patterns similar to D1-SPNs, we acknowledge that whether their direct manipulation would produce comparable behavioral effects to D1-SPNs remains to be determined. Moreover, it remains to be determined how the activity and function of Stard5<sup>+</sup> neurons compares to D2-SPNs.

      To specifically isolate Stard5<sup>+</sup> D1-SPNs, we generated a Stard5-Flp;Drd1-Cre mouse line via breeding. However, the 4 CreON/FlpON AAVs which we tested exhibited leaky expression, including ectopic expression in Cre-positive but Flp-negative cells. This prevented reliable, cell-type-specific manipulation. We are actively working to overcome this common technical limitation of Flp/Cre AAVs, and these experiments will be addressed in a future study.

      Recommendations for the authors:

      Editor's note:

      Readers would also benefit from coding individual data points by sex and noting N/sex in the figure legends.

      We thank the editor for the note, we have noted in each figure legend the N and sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) Integration of results: The manuscript reads as two partly disconnected halves (functional gradient vs. molecular profiling). A more precise articulation of how the molecular findings (Stard5, Peg10) directly relate to the functional data would improve coherence.

      We thank the Reviewer for raising this important point. We agree that clearer integration between the functional gradient and the molecular findings would strengthen the manuscript. In the present study, Stard5 and Peg10 are not introduced as mechanistic drivers of behavior, but as molecular landmarks that map onto the functional rostro-caudal organization of the medNAcSh.

      Stard5 expression is enriched in the rostral medNAcSh, where we identify a functional hotspot for rewardrelated feeding, whereas Peg10 marks more caudal territories. Thus, the molecular profiling provides an independent axis that aligns with and supports the functional gradient revealed by photometry and optogenetic experiments. Whether these genes themselves contribute causally to feeding or aversive behaviors remains an open and interesting question for future studies.

      To improve clarity, we have explicitly articulated this link in the Discussion:

      “Importantly, our results indicate that spatial organization also defines functional specialization in the medNAcSh, and that molecular markers such as Stard5 provide access to these spatially defined subterritories rather than labeling a single, homogenous neuronal subtype.“

      “Having established a robust functional dichotomy of D1-SPNs along the rostro-caudal axis in reward consumption, we next asked whether this functional organization is mirrored by differences in molecular composition across the medNAcSh. Using multiple anatomical techniques, we find strong differences in the molecular composition of the rostral vs. caudal medNAcSh, which in turn could explain behavioral differences between these brain subregions.”

      “This makes Stard5 a spatial molecular landmark that captures the cellular ensemble of the rostral feeding hotspot, rather than a marker defining a single functional cell class. It is interesting that Stard5, a STARTdomain protein implicated in cholesterol metabolism and cellular stress responses (Alpy and Tomasetto, 2005; Rodriguez-Agudo et al., 2012; Calderon-Dominguez et al., 2014), and Peg10, an imprinted gene with roles in embryonic development and cancer (Mou et al. 2025), mark distinct rostro-caudal domains of the medNAcSh. Whether these genes themselves causally contribute to appetitive and consummatory behaviors, or aversive processing in this region remains an important question for future studies.”

      (2) Injection site specificity: Given prior work on NAc manipulations, it is essential to ensure precise targeting. Representative images from both rostral and caudal placements, including verification of fiber/injection confinement, would increase confidence.

      We thank the Reviewer for this important point regarding injection site specificity. Optic fiber placement was validated by identifying the coronal section in which the fiber tip was centered and aligning it to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). We validated currently a total of 14 brains, shown in the newly added Supplementary Figure S10.

      The primary source of variability across animals could be the extent of the viral spread and the size of the optic implants, which were 400 for photometry experiments and 200 μm for the optogenetic studies. We acknowledge that this limits the spatial precision with which the individual subregions can be isolated. This limitation is explicitly discussed in the manuscript.

      Importantly, despite this limitation, we detected robust and reproducible differences between rostral and caudal medNAcSh in reward-consumption photometry and optogenetic assays. This argues against injection site proximity or fiber misplacement being a major confounding factor for the main conclusions. Nonetheless this comment is a valid point, and in future studies we plan to establish targeting methods with reduced viral volumes and/or tapered optic fibers (Pisanello et al. 2017). This will allow finer spatial restriction and more precise dissection of medNAcSh subregions.

      (3) Minor clarifications:

      (a) Provide explicit definitions of "rostral" and "caudal" coordinates.

      We adjusted Figure 1 and added the coordinates.

      (b) Consider alternative wording to "gradient" since only two rostro-caudal positions are tested.

      RNA-seq and MERFISH data indicate that molecular markers in the NAcSh are organized along a continuous rostro–caudal gradient rather than discrete boundaries (Chen et al. 2021; Stanley et al. 2020). Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled two representative positions along this continuum.

      We added the following sentence in the discussion for clarification:

      “Of note, in this paper we decided to use the term “rostro-caudal gradient”, motivated by converging evidence from prior pharmacological studies (see below) and scRNA sequencing data (Chen et al., 2021; Stanley et al., 2020), which show continuous molecular and functional changes along the rostro-caudal axis of the medNAcSh rather than sharply defined boundaries. Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled only two representative positions along this continuum.”

      (c) Enhance representative images (e.g., stronger DAPI, zoom-ins, bregma coordinates).

      To improve clarity, we have adjusted Figure 1 by adding schematic representations including stereotaxic surgery coordinates, which facilitate interpretation of rostro–caudal targeting.

      (d) Report trial numbers in figure legends, injection site details (e.g., S1 mouse), learning curves, and rationale for low-pass filtering in photometry.

      We thank the Reviewer for these suggestions. The average number of successful trials is now reported in the figure legends (Figure 1 and Figure 7). Injection site details are described in the Methods and are now also illustrated in Figure 1a and validated in Supplementary Figure S10. In addition, we have added Supplementary Figure S8 showing the learning curves of the Drd1-Cre and Stard5-Flp mice included in this study.

      Regarding the low-pass filtering in photometry analysis: low-pass filtering (1 Hz) was applied to the signal to remove high-frequency noise and isolate slow calcium-dependent fluorescence fluctuations that reflect population-level neural activity as we have done before (Labouesse et al. 2023, 2024). Low-pass filtering is a commonly-used analysis in fiber photometry and often shows a better artifact-corrected signal (Zhang et al. 2023; Keevers and Jean-Richard-dit-Bressel 2025).

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) As mentioned, I find the part on Stard5-positive neurons a bit disconnected. Ideally, as mentioned in the discussion, the author could cross Stard5-Flp mice with D1-cre to selectively monitor and/or manipulate these neurons. Alternatively, do they have any data regarding D2-positive neurons of the rostral part to show whether they behave differently from D1-positive neurons?

      We thank the Reviewer for this suggestion and agree that selectively monitoring or manipulating Stard5<sup>+</sup> D1-SPNs using an intersectional approach would strengthen the link between the molecular and functional findings. We are pursuing this strategy by crossing Stard5-Flp mice with Drd1-Cre mice; however, as noted above, currently available CreON/FlpON viral tools exhibited leaky expression (a commonly known problem for such AAVs), preventing reliable cell-type–specific targeting. As a result, these experiments are ongoing (including reducing the titers) and will be addressed in a future study.

      At present, we do not have equivalent functional data for D2-SPNs in the rostral medNAcSh. Investigating whether rostral D2-SPNs behave differently from caudal D2-SPNs is an important and interesting question, which we hope to address in a future study. This limitation is acknowledged in the discussion.

      (2) Do the authors have any data on locomotor activity when they manipulate D1-expressing neurons? Lower food consumption as well as lower activity in the stimulated compartment - interpreted as aversion - could be related to diminished locomotor activity.

      We thank the reviewer for the relevant point about locomotion. We ran new analyses of locomotor activity during the feeding task (operant boxes) using a machine-learning model. A small subset of frames (136 frames from 10 video recordings) was manually annotated to define the animal’s body center and nose, as well as the four corners of the operant box. These annotations were used to train a YOLO (Redmon et al. 2015)-based pose estimation model. Locomotion metrics, such as total distance moved were subsequently derived from the temporal integration of positional data and aligned to opto-on and opto-off epochs of the feeding task. During licking periods, the animal’s body center remains largely stationary, which could lead to an overestimation of immobility. Nevertheless, we quantified the total distance traveled in the entire operant box across epochs, shown in Supplementary Figure S9 a-b. In our proof-of-concept experiment (Figure 2c-e), locomotion was increased in rostral ChrimsonR mice compared to controls (Supplementary Figure S9a), a similar effect seen with chemogenetic activation of D1-SPNs (Zhu, Ottenheimer, and DiLeone 2016). In our full experimental cohort, locomotion did not differ between control, rostral and caudal ChrimsonR mice across laser on and laser off epochs. These results indicate that reduced reward consumption during stimulation of rostral D1-SPNs is not due to decreased locomotor activity. Notably, whereas the inhibitory effect on consumption is specific to rostral D1-SPNs activation, locomotor effects are similar for both rostral and caudal D1-SPNs stimulation, indicating they are at least partly dissociated from one another.

      Moreover, in the RTPPA task, it is accepted that the percentage of time spent in the light-paired chamber reflects the preference or aversiveness to optogenetic stimulation. We additionally quantified total distance traveled (Supplementary Figure S9c). While optogenetic stimulation of both rostral and caudal D1-SPNs reduced time spent in the light-paired chamber (Figure 4), total distance traveled was unchanged, indicating that the observed aversion is not due to reduced locomotion.

      We added the following to the Results section: “To determine whether the reduced reward consumption observed in Rostral ChrimsonR mice could be explained by changes in locomotion, we quantified the total distance traveled during this task. Optogenetic stimulation led to an increase in locomotion in the small cohort of Rostral ChrimsonR mice in the reward consumption experiment shown in Figure 2d-e (Supplementary Figure S9a), while no change in locomotion was observed across epochs in mCherry controls, ChrimsonR Rostral and Caudal mice (Supplementary Figure S9b, related to Figure 2g-i)”

      And

      “Quantification of locomotion showed no reduction in distance traveled in the light-paired chamber (Supplementary Figure S9c), indicating that the avoidance was not driven by impaired locomotion. These data indicate that medNAcSh D1-SPNs generally promote aversion without affecting locomotion and without major differences along the rostro-caudal axis”

      Additionally, we added the following sentence to the Discussion: “Importantly, our behavioral effects of rostral D1-SPNs in the reward consumption and RTTPA assays could not be explained by reduced locomotor activity. Indeed, optogenetic stimulation of D1-SPNs during the reward consumption task did not reduce locomotion; instead, locomotion was either unchanged or increased in a small cohort of Rostral ChrimsonR mice. The increased locomotion likely reflected appetitive behavior and is consistent with past chemogenetic studies (Zhu et al., 2016). In the RTTPA no locomotion differences were detected.“

      (3) It would be useful to provide a schematic (or pictures) for the location of fiber implantation in all animals for both photometry and optogenetics.

      We validated optic fiber placement in 14 animals by identifying the coronal section in which the fiber tip was centered and aligning this section to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). Representative optic fiber placement and viral spread are shown in the newly added Supplementary Figure S10.

      Minor Comments:

      (1) Figure 6e and g seem mislabeled: "Drd1+ (D2-SPNs)".

      Yes, thank you. We corrected it.

      (2) Line 395-397: the authors mention Flp minimal Flp Leakage, but could it be low activity of Stard5 promoter in the core and dorsal striatum that allows little expression of the flippase that could be sufficient for recombination?

      We thank the Reviewer for this insightful point. We cannot fully distinguish between these possibilities in the current study; however, the overall recombination outside the target region remains minimal, supporting the utility of the Stard5-Flp line for selective targeting of the rostral medNAcSh. Injection of a Flp-dependent AAV into the lateral shell, core and dorsal striatum showed no expression, therefore we think this is unlikely. Moreover, this aligns with Stard5 expression patterns derived from the scRNAseq data (Chen et al. 2021), Allen Brain Atlas quantifications (Figure 5) and our RNAscope analysis (Figure 6). Nevertheless, we acknowledge that histology alone cannot definitively exclude this possibility, and quantitative approaches such as qPCR would be required.

      References

      Alpy, Fabien, and Catherine Tomasetto. 2005. “Give Lipids a START: The StAR-Related Lipid Transfer (START) Domain in Mammals.” Journal of Cell Science 118(13):2791–2801. doi:10.1242/jcs.02485.

      Calderon-Dominguez, Maria, Gregorio Gil, Miguel Angel Medina, William M. Pandak, and Daniel RodríguezAgudo. 2014. “The StarD4 Subfamily of Steroidogenic Acute Regulatory-Related Lipid Transfer (START) Domain Proteins: New Players in Cholesterol Metabolism.” The International Journal of Biochemistry & Cell Biology 49:64–68. doi:10.1016/j.biocel.2014.01.002.

      Chen, Renchao, Timothy R. Blosser, Mohamed N. Djekidel, Junjie Hao, Aritra Bhattacherjee, Wenqiang Chen, Luis M. Tuesta, Xiaowei Zhuang, and Yi Zhang. 2021. “Decoding Molecular and Cellular Heterogeneity of Mouse Nucleus Accumbens.” Nature Neuroscience 24(12):1757–71. doi:10.1038/s41593-021-00938-x.

      Domingues, Ana Verónica, Tawan T. A. Carvalho, Gabriela J. Martins, Raquel Correia, Bárbara Coimbra, Ricardo Bastos-Gonçalves, Marcelina Wezik, Rita Gaspar, Luísa Pinto, Nuno Sousa, Rui M. Costa, Carina Soares-Cunha, and Ana João Rodrigues. 2025. “Dynamic Representation of Appetitive and Aversive Stimuli in Nucleus Accumbens Shell D1- and D2-Medium Spiny Neurons.” Nature Communications 16(1):59. doi:10.1038/s41467-024-55269-9.

      Keevers, Luke J., and Philip Jean-Richard-dit-Bressel. 2025. “Obtaining Artifact-Corrected Signals in Fiber Photometry via Isosbestic Signals, Robust Regression, and DF/F Calculations.” Neurophotonics 12(02). doi:10.1117/1.NPh.12.2.025003.

      Labouesse, Marie A., Arturo Torres-Herraez, Muhammad O. Chohan, Joseph M. Villarin, Julia Greenwald, Xiaoxiao Sun, Mysarah Zahran, Alice Tang, Sherry Lam, Jeremy Veenstra-VanderWeele, Clay O. Lacefield, Jordi Bonaventura, Michael Michaelides, C. Savio Chan, Ofer Yizhar, and Christoph Kellendonk. 2023. “A Non-Canonical Striatopallidal Go Pathway That Supports Motor Control.” Nature Communications 14(1):6712. doi:10.1038/s41467-023-42288-1.

      Labouesse, Marie A., Maria Wilhelm, Zacharoula Kagiampaki, Andrew G. Yee, Raphaelle Denis, Masaya Harada, Andrea Gresch, Alina-Măriuca Marinescu, Kanako Otomo, Sebastiano Curreli, Laia Serratosa Capdevila, Xuehan Zhou, Reto B. Cola, Luca Ravotto, Chaim Glück, Stanislav Cherepanov, Bruno Weber, Xin Zhou, Jason Katner, Kjell A. Svensson, Tommaso Fellin, Louis-Eric Trudeau, Christopher P. Ford, Yaroslav Sych, and Tommaso Patriarchi. 2024. “A Chemogenetic Approach for Dopamine Imaging with Tunable Sensitivity.” Nature Communications 15(1):5551. doi:10.1038/s41467-024-49442-3.

      Liu, Yiqiong, Ying Wang, Zheng-dong Zhao, Guoguang Xie, Chao Zhang, Renchao Chen, and Yi Zhang. 2024. “A Subset of Dopamine Receptor-Expressing Neurons in the Nucleus Accumbens Controls Feeding and Energy Homeostasis.” Nature Metabolism 6(8):1616–31. doi:10.1038/s42255-02401100-0.

      Mou, Dachao, Shasha Wu, Yanqiong Chen, Yun Wang, Yufang Dai, Min Tang, Xiu Teng, Shijun Bai, and Xiufeng Bai. 2025. “Roles of PEG10 in Cancer and Neurodegenerative Disorder (Review).” Oncology Reports 53(5):1–9. doi:10.3892/or.2025.8893.

      O’Connor, Eoin C., Yves Kremer, Sandrine Lefort, Masaya Harada, Vincent Pascoli, Clément Rohner, and Christian Lüscher. 2015. “Accumbal D1R Neurons Projecting to Lateral Hypothalamus Authorize Feeding.” Neuron 88(3):553–64. doi:10.1016/j.neuron.2015.09.038.

      Pedersen, Christian E., Raajaram Gowrishankar, Sean C. Piantadosi, Daniel C. Castro, Madelyn M. Gray, Zhe C. Zhou, Shane A. Kan, Patrick J. Murphy, Patrick R. O’Neill, and Michael R. Bruchas. 2022. “Medial Accumbens Shell Spiny Projection Neurons Encode Relative Reward Preference.”

      Pisanello, Ferruccio, Gil Mandelbaum, Marco Pisanello, Ian A. Oldenburg, Leonardo Sileo, Jeffrey E. Markowitz, Ralph E. Peterson, Andrea Della Patria, Trevor M. Haynes, Mohamed S. Emara, Barbara Spagnolo, Sandeep Robert Datta, Massimo De Vittorio, and Bernardo L. Sabatini. 2017. “Dynamic Illumination of Spatially Restricted or Large Brain Volumes via a Single Tapered Optical Fiber.” Nature Neuroscience 20(8):1180–88. doi:10.1038/nn.4591.

      Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. “You Only Look Once: Unified, Real-Time Object Detection.”

      Requejo-Mendoza, Nikte, José-Antonio Arias-Montaño, and Ranier Gutierrez. 2025. “Nucleus Accumbens D2-Expressing Neurons: Balancing Reward and Licking Disruption through Rhythmic Optogenetic Stimulation” edited by J. M. Dominguez. PLOS ONE 20(2):e0317605. doi:10.1371/journal.pone.0317605.

      Rodriguez-Agudo, Daniel, Maria Calderon-Dominguez, Miguel Angel Medina, Shunlin Ren, Gregorio Gil, and William M. Pandak. 2012. “ER Stress Increases StarD5 Expression by Stabilizing Its MRNA and Leads to Relocalization of Its Protein from the Nucleus to the Membranes.” Journal of Lipid Research 53(12):2708–15. doi:10.1194/jlr.M031997.

      Stanley, Geoffrey, Ozgun Gokce, Robert C. Malenka, Thomas C. Südhof, and Stephen R. Quake. 2020. “Continuous and Discrete Neuron Types of the Adult Murine Striatum.” Neuron 105(4):688-699.e8. doi:10.1016/j.neuron.2019.11.004.

      Zhang, Yan, Márton Rózsa, Yajie Liang, Daniel Bushey, Ziqiang Wei, Jihong Zheng, Daniel Reep, Gerard Joey Broussard, Arthur Tsang, Getahun Tsegaye, Sujatha Narayan, Christopher J. Obara, JingXuan Lim, Ronak Patel, Rongwei Zhang, Misha B. Ahrens, Glenn C. Turner, Samuel S. H. Wang, Wyatt L. Korff, Eric R. Schreiter, Karel Svoboda, Jeremy P. Hasseman, Ilya Kolb, and Loren L. Looger. 2023. “Fast and Sensitive GCaMP Calcium Indicators for Imaging Neural Populations.” Nature 615(7954):884–91. doi:10.1038/s41586-023-05828-9.

      Zhu, Xianglong, David Ottenheimer, and Ralph J. DiLeone. 2016. “Activity of D1/2 Receptor Expressing Neurons in the Nucleus Accumbens Regulates Running, Locomotion, and Food Intake.” Frontiers in Behavioral Neuroscience 10. doi:10.3389/fnbeh.2016.00066.

    1. eLife Assessment

      This fundamental work advances our understanding of the role of kisspeptin neurons in regulating the luteinizing hormone (LH) surge in females. The study uses cutting-edge techniques to provide compelling and rigorous data supporting a critical role of RP3V kisspeptin neurons in the neuroendocrine LH surge process. This research will be of interest to reproductive biologists and neuroscientists studying the female ovarian cycle. Continuing to examine the complexities of the LH surge and the neuronal populations involved, as done in this study, is critical for developing therapeutic treatments for women's reproductive disorders.

    2. Joint Public Review:

      Summary:

      This is an excellent, timely study investigating and characterizing the underlying neural activity that generates the neuroendocrine GnRH and LH surges that are responsible for triggering ovulation. Abundant evidence accumulated over the past 20 years implicated the population of kisspeptin neurons in the hypothalamic RP3V region (also referred to as the POA or AVPV/PeN kisspeptin neurons) as being involved in driving the GnRH surge in response to elevated estradiol (E2), also known as the estrogen positive feedback. However, while former studies used cfos coexpression as a marker of RP3V kisspeptin neuron activation at specific times and found that this correlates with the timing of the LH surge, detailed examination of the live in vivo activity of these neurons before, during, and after the LH surge, remained elusive due to technical challenges. In this exciting study, Zhou and colleagues use fiber photometry to measure the long-term synchronous activity of RP3V kisspeptin neurons across different stages of the mouse estrous cycle, including on proestrus when the LH surge occurs, as well as in a well-established OVX+E2 mouse model of the LH surge. For this they used kiss-Cre female mice that were injected with a Cre-dependent AAV injection containing GCaMP6, in order to measure the neuronal activation of RP3V Kiss1 cells.

      The authors report that RP3V kisspeptin neuronal activity is low on estrous and diestrus, but increases on proestrus several hours before the late afternoon LH surge, mirroring prior reports of rising GnRH neuron activity in proestrus female mice. The measured increase in RP3V kisspeptin activation is long, spanning ~13 hours in proestrus females and extending well beyond the end of the LH secretion, and is shown by the authors to be E2 dependent. In addition, an intriguing cyclical oscillation in kisspeptin neural activity every 90 minutes exists, which may offer critical insight into how the RP3V kisspeptin system operates.

      The compelling methodology allowed the authors to measure RP3V neuronal activation across multiple ovarian cycles in the same mouse, which demonstrated that the timing of the LH surge is variable across cycles, even within the same mouse. In addition, the authors demonstrated using the same females, that ovariectomy resulted in very little neuronal activity in RP3V kisspeptin neurons. When these ovariectomized females were treated with estradiol benzoate (EB) and an LH surge was induced, there was an increase in RP3V kisspeptin neuronal activation, as was seen during proestrus. However, the magnitude of the change in activity was greater during proestrus than during the EB-induced LH surge. Interestingly, the authors noted a consistent peak in activity about 90 minutes prior to lights out on each day of the ovarian cycle and during EB treatment, but not in ovariectomized females. The functional significance of this consistent neuronal activity at this time remains to be determined. In summary, the data from these experiments is compelling and supports the hypothesis in the field that the RP3V kisspeptin neurons regulate the LH surge.

      Strengths:

      - The study is well designed, uses proper controls and analyses, has robust data, and the paper is nicely organized and written.

      - The study is well done and complete, looking at neuronal activation at each stage of the ovarian cycle and then additionally, how neuronal activation in ovariectomized and ovariectomized + EB females compares to that of gonad-intact females. Though not part of this study, the comparison of neuronal activation of GnRH neurons during the LH surge to the current data was convincing, demonstrating a similar pattern of increased activation that precedes the LH surge.

      - The authors provide a technical advance for the field in the ability to accurately measure RP3V kisspeptin neuron activity in actively awake, live mice for long periods of time, spanning different cycle stages. This approach offers novel and useful insights into the impact of E2 and circadian cues on the electrical activity of RP3V kisspeptin neurons.

      - The within-subjects design used in these experiments is a major strength because it allowed the authors to collect data across multiple ovarian cycles, following ovariectomy, and then with EB treatment. The variability in neuronal activity surrounding the LH surge across ovarian cycles in the same animals is interesting and could not be achieved without this within-subjects design.

      - The inclusion and comparison of ovary-intact females and OVX+E2 female is valuable to help test mechanisms under these two valuable LH surge conditions, and allows for further future studies to tease apart minor differences in the LH surge pattern between these 2 conditions.

      - The discovery of cyclical oscillation in RP3V kisspeptin neural activity every 90 minutes is intriguing and interesting, and may offer critical insight into how the RP3V kisspeptin system operates, which can be further tested in future studies.

      Weaknesses:

      - LH levels were not measured in many mice or in robust temporal detail, to allow a more detailed comparison between the fine-scale timing of RP3V neuron activation with onset and timing of LH surge dynamics. While the "peak LH" occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation, it is likely that LH values start to increase several hours before the peak LH, closer to when the first RP3V kisspeptin neuron activity first occurs. Therefore, the onset of the LH surge is likely to be closer to the beginning of the RP3V kisspeptin activity, but future studies are needed to study this timing.

      - One minor concern is that LH levels were not measured in the ovariectomized females during the expected time of the LH surge. The authors suggest that the lower magnitude of activation during the LH surge in these females, in comparison to proestrus females, may be the result of lower LH levels. It's hard to interpret the difference in magnitude of neuronal activation between EB-treated and proestrus females without knowing LH levels. In addition, it's possible that an LH surge did not occur in all EB-treated females, and thus, having LH levels would confirm the success of the EB treatment.

      - The authors nicely show that there is some variation (~2 hours) in the peak of the first oscillation in cycling proestrus females. By contrast, the small sample size for OVX+E2 females did not permit a similar rigorous analysis of temporal variability under such estrogen-controlled conditions, which will need to be studied in future projects.

      Comments on revisions:

      The authors have revised the manuscript adequately. There are no further recommended edits or revisions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      (1) LH levels were not measured in many mice or in robust temporal detail, such as every 30 or 60 min, to allow a more detailed comparison between the fine-scale timing of RP3V neuron activation with onset and timing of LH surge dynamics.

      Please see “Recommendations for Authors” below.

      (2) The authors report that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation. However, it is likely, and indeed evident from the 2 example LH patterns shown in Figures 3A-B, that LH values start to increase several hours before the peak LH. This earlier rise in LH levels ("onset" of the surge) occurs much closer in time to the first RP3V kisspeptin neuron oscillatory activation, and as such, the ensuing LH secretion may not be as delayed as the authors suggest.

      Please see “Recommendations for Authors” below.

      (3) The authors nicely show that there is some variation (~2 hours) in the peak of the first oscillation in proestrus females. Was this same variability present in OVX+E2 females, or was the variability smaller or absent in OVX+E2 versus proestrus? It is possible that the variability in proestrus mice is due to variability in the timing and magnitude of rising E2 levels, which would, in theory, be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice may have less variability between mice for the onset of RP3V kisspeptin activity.

      Please see “Recommendations for Authors” below.

      (4) One concern regarding this study is the lack of data showing the specificity of the AAV and the GCaMP6s signals. There are no data showing that GCaMP6s is limited to the RP3V and is not expressed in other Kiss1 populations in the brain. Given that 2ul of the AAV was injected, which seems like a lot considering it was close to the ventricle, it is important to show that the signal and measured activity are specific to the RP3V region. Though the authors discuss potential reasons for the low co-expression of GCaMP6 and kisspeptin immunoreactivity, it does raise some concern regarding the interpretation of these results. The low co-expression makes it difficult to confirm the Kiss1 cell-specificity of the Cre-dependent AAV injections. In addition, if GFP (GCaMP6s) and kisspeptin protein co-localization is low, it is possible that the activation of these neurons does not coincide with changes in kisspeptin or that these neurons are even expressing Kiss1 or kisspeptin at the time of activation. It is important to remember that the study measures activation of the kisspeptin neuron, and it does not reveal anything specific about the activity of the kisspeptin protein.

      Please see “Recommendations for Authors” below.

      (5) One additional minor concern is that LH levels were not measured in the ovariectomized females during the expected time of the LH surge. The authors suggest that the lower magnitude of activation during the LH surge in these females, in comparison to proestrus females, may be the result of lower LH levels. It's hard to interpret the difference in magnitude of neuronal activation between EB-treated and proestrus females without knowing LH levels. In addition, it's possible that an LH surge did not occur in all EB-treated females, and thus, having LH levels would confirm the success of the EB treatment.

      Please see “Recommendations for Authors” below.

      (6) This kisspeptin neuron peak activity is abolished in ovariectomized mice, and estradiol replacement restored this activity, but only partially. Circulating levels of estradiol were not measured in these different setups, but the authors hypothesize that the lack of full restoration may be due to the absence of other ovarian signals, possibly progesterone.

      Please see “Recommendations for Authors” below.

      (7) Recordings in several mice show inter- and intra-variability in the time of peak onset. It is not shown whether this variability is associated with a similar variability in the timing of the LH surge onset in the recorded mice. The authors hypothesized that this variability indicates a poor involvement of the circadian input. However, no experiments were done to investigate the role of the (vasopressinergic-driven) circadian input on the kisspeptin neuron activation at the light/dark transition. Thus, we suggest that the authors be more tentative about this hypothesis.

      Please see “Recommendations for Authors” below.

      Recommendations for the authors:

      (1) The study measured LH levels over time in just 5 female mice, a small sample size given the variability between mice. Having said that, n=5 is an OK starting point but the LH values are only shown for 2 mice, and there are no graphs or presentation of mean LH levels over time for all 5 mice. Figure 3 would greatly benefit from graphing and statistical analyses of the LH levels for all 5 mice (mean line graphs over time or similar). The authors report the mean "peak LH" level in the text, but it would be important to show and graph all the LH values over time (either by clock time or time relative to start of first RP3V oscillation or both), to allow the reader to compare the LH pattern to the RP3V kisspeptin neuron activity over time.

      We share the Reviewer’s frustration regarding the lack of detailed LH time points to correlate with the changes in GCaMP signal. Certainly, it was our intention to do better. However, with the benefit of actually being able to monitor surge progress through RP3V neuron activity in real time, we found that frequent blood sampling could often interfere with the normal dynamic of surge activity. One some occasions, the RP3V kisspeptin neuron oscillations would stop abruptly mid- or early-surge while on others it would stop and then start again. Knowing that this was not the normal profile, we resorted to taking as few blood samples as possible, trying primarily to get what we thought might be the “peak” LH surge level. We acknowledge that this is not ideal, and leaves open the important question around the precise relationship of the beginning of RP3V kisspeptin oscillations with LH secretion. Although not answering the question directly, this was part of the motivation for the last figure which emphasizes how the RP3V kisspeptin neuron activity and GnRH neuron dendron activity are essentially identical at the time of the surge. We have re-written the relevant section of the Discussion to be more circumspect.

      (2) The authors report and discuss that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation but it is likely, and indeed evident from the 2 example LH patterns shown in Figs 3A-B, that LH values start to increase several hours earlier, well before the peak LH. Thus, the rise in LH levels during the surge starts much closer in time to the first RP3V kisspeptin neuron oscillatory activation, which the authors don't analyze. For example, the 2nd LH value for the 2 representative mice shown in Figure 3 is notably higher than the 1st LH value of those mice, even though the peak value has not yet been attained. Even with the LH levels only being measured here every couple hours, this "first detected rise in LH" be at least be graphed and/or analyzed relative to the timing of kisspeptin neuron activity, and commented on in the Discussion.

      As above.

      (3) It is unclear if the variation (~2 hours) in the peak of the first oscillation in proestrus females is the same as in OVX+E2 females, or was the variability smaller or absent in OVX+E2 females versus proestrus? The variability observed in proestrus mice is likely due to variability in timing and magnitude of rising E2 levels, which would may be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice might display less variability for the timing of the RP3V kisspeptin activity "onset". This measure would be important to analyze here and to discuss, given that many labs around the world often use an OVX+E2 model.

      This is an interesting point given the dogma surrounding the role of the SCN in initiating the surge. Three of the five OVX+E2 mice exhibited clearly discernible GCaMP oscillations that started at approximately noon, 1pm and 2pm. While this sample is very small, it does suggest that the onset of RP3V kisspeptin neuron activity is variable as found in proestrous mice. We have indicated this cautiously given the sample size.

      (4) If looking at kisspeptin immunoreactivity is problematic, is it possible to look at Kiss1 RNA levels or to look at Cre-recombinase protein levels? While the Cre-recombinase would just be a proxy for Kiss1/kisspeptin, it may result in higher expression and better co-localization with the GCaMP6s.

      Yes, RNAscope would likely be the ideal method to settle this long running issue of apparently poor Kiss-cre targeting in the RP3V. Unexpectedly, however, we found that the mCherry probe bound to Kiss1 in our attempts at an RNAscope evaluation. The use of Cre as a proxy for identifying kisspeptin neurons would almost certainly generate better co-localization as Cre is being used to target GCaMP.

      Minor

      (1) It was not clear in the manuscript how many cells were counted or contributed to the neuronal activation data. Is it the entire population of RP3V Kiss1 cells? Just a subset? How much variability is there in the number of cells measured/counted between animals? Presumably, the brains were extracted to confirm the placement of the optic fiber. Were there neuroanatomical studies also done on these animals to confirm how many cells express GFP (GCaMP6) and the correct placement and specificity of the AAV? Is there any potential that cells in the BnST or even the ARC took up the virus and were included in these measurements?

      It is very difficult if not impossible to establish just how many RP3V kisspeptin neurons contribute to the GCaMP population signal using fibre photometry. This will depend on levels of AAV transfection, distance from the optic fibre, and the numbers of RP3V kisspeptin neurons actually involved in the surge mechanism. Of note, C-Fos data suggest that only around one-third of RP3V kisspeptin neurons are activated at the time of the surge. All fibre placements were subsequently shown to be running alongside GCaMP-expressing AVPV/anterior periventricular nucleus cells (now noted), but the numbers of transfected cells were not quantified. As shown in Fig.4, the GCaMP signal was very similar across all mice suggesting little variation in the relationship between transduction, fibre placement and distance.

      The RP3V region is approximately 4-5 mm from the ARN. We felt that the possibility that an AAV injection in the RP3V would spill over into the ARN was so remote that we did not assess GCaMP expression in ARN kisspeptin neurons. We have previously determined for the ARN that recordable GCaMP fluorescence only occurs if the optic fibre is within 0.5 mm from GCaMP-expressing neurons. Ultimately, proof that we are not recording from ARN kisspeptin neurons comes from the very different activity patterns reported here for RP3V neurons compared to the kisspeptin pulse generator. We did not see any GCaMP expressed in the BNST.

      (2) If it is possible to measure LH levels in the EB surge animals, it would be helpful, at least to confirm that they did surge and to support the proposed idea that LH surge levels are lower in that model.

      Unfortunately, as acknowledged in the original text we did not take blood samples from these mice so do not have the data. However, as noted, other studies undertaken by us using the same EB surge paradigm show that peak LH levels are much lower compared to proestrus. In retrospect we do agree that this would have been useful and particularly to establish whether each mouse did show a surge as two of the OVX+EB mice failed to show typical surge-associated oscillations. We have noted this in the Discussion.

      (3) For Figure 4F, please add a gray shaded box to the graph to denote the "dark" period (lights off), as was done for Figures 2 and 3. This is important because Figure 4F is making the point that there is a consistent 90-minute oscillation event right before lights off, so it would be helpful to denote the period of lights off on the graph.

      There was in fact a very light grey shade, but we have now added a grey bar to make the dark period clearer.

      (4) The Title of the paper should include the brain region because this is specifically the RP3V (or preoptic area "POA") kisspeptin neurons that are studied, not other kisspeptin cell populations.

      We have added “preoptic area” to clarify

      (5) The graphs in Figure 3C-D are from different mice and address a different question than the graphs in Figure 3A-B. This was a bit confusing, and it is recommended that the LH + RP3V kisspeptin activity experiment (Figures 3A-B) be its own figure, and the graphs looking at the detailed oscillatory patterns in Figures 3C-D be their own figure, as the latter are addressing a different question and don't have any LH data.

      We have split the figure as requested.

      (6) The tiny font size of the X and Y axes of Figures 2 and 3 is very small and hard to read. Can this text please be increased in size a little? By comparison, the font size of the X and Y axes of Figure 4 is bigger and more legible.

      Changed.

      (7) In the methods for fiber photometry, there is a sentence saying "Twenty two-hour recordings were made..." This was confusing, as it read as if there were twenty 2-hour recordings, when in fact it was one 22-hour recording. The authors should reword or use "22-hour" in this sentence.

      Changed.

      (8) It's a bit hard to see the difference in color between proestrus 1 and proestrus 2 (both blues) in Figure 6, especially when they overlap. It might be helpful to select a different color for one of them.

      Changed.

      (9) Is the virus from Addgene or just the plasmid? Did Addgene insert the plasmid into the virus, or was that done elsewhere? For purposes of replication, it might be helpful to state the plasmid that was used and the virus that was used, and their origins (e.g., if made by Addgene or donated by another investigator). I was not able to find the virus based on the Addgene number in the manuscript and was getting plasmids with different Addgene #s.

      Apologies, the numbering was incorrect. We have now amended to 100842-AAV9 that was packaged by Addgene.

    1. eLife Assessment

      This important study tackles an interesting aspect of fungal physiology: how a mitochondria-associated gene influences production of the secondary metabolite DON and fungicide sensitivity. The authors have improved the manuscript and the supporting evidence is convincing.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control.

      Comments on revised version:

      I have no further comments on the revision.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their study the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely and the manuscript is well written, albeit in some cases details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease of the DON content associated with deletion of FgDML1: Although some growth data are shown in figure 6 - indicating a severe growth defect - the DON production presented in figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to a decreased growth and specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation on the same conditions to the DON amount detected. Only then a conclusion as to an altered production in the mutant strains can be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. The point to point responds to the reviewer’s comments are listed as following.

      Comments to the revised manuscript:

      The authors carefully revised the manuscript and provided explanations for methods in several cases. However, there are still some problems - probably due to misunderstanding - that need revision.

      (1) A major problem of the first version of the manuscript was the lack of appropriate description of biomass analysis and the consideration of the respective results for evaluation of production of DON and other metabolites. Although the authors provide some explanation in the response to reviews, I could not find a corresponding explanation or description in the manuscript. It is not sufficient to explain the problem to me, but a detailed explanation and description of the method has to be provided in the manuscript along with the definition of one "unit of mycelium". It is still not entirely clear to me what such a "unit of mycelium" is.

      Please clarify this and any other uncertainties that were commented on by me and other reviewers in the manuscript, not only in the response to reviews. Also adjust the reference list accordingly.

      Thank you very much for your advice. We appreciate the reviewer’s continued attention to the potential impact of biomass differences on DON production, particularly in light of the reduced growth rate observed in the mutant strain.

      We acknowledge that the mutant exhibits slower growth compared to the wild-type strain. However, it is important to emphasize that the reduction in DON levels reported in this study cannot be attributed to decreased fungal biomass. In our experimental design, DON production was normalized to mycelial dry weight, and toxin levels are expressed as μg DON per g dry mycelium. Therefore, differences in total mycelial accumulation among strains were explicitly accounted for and eliminated during data analysis.

      By expressing DON production on a per-unit-biomass basis, the measured values reflect the intrinsic DON biosynthetic capacity of the mycelium rather than the overall growth rate or total biomass. Consequently, the observed reduction in DON content in the mutant indicates a genuine impairment in DON biosynthesis per unit of fungal biomass, rather than a secondary effect resulting from reduced mycelial growth.

      To avoid ambiguity, we have clarified this point in the revised manuscript by explicitly stating the normalization strategy and the definition of the mycelial unit in the Materials and Methods section, and by emphasizing in the Results/Discussion section that DON levels were compared on a biomass-normalized basis.

      We hope that this clarification adequately addresses the reviewer’s concern and clearly distinguishes growth-related effects from alterations in toxin biosynthesis.

      “DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). Under toxin-producing conditions (28 °C, 145 rpm), fungal strains were cultured in TBI medium for 7 days. Cultures were initiated using freshly grown mycelia. After incubation, mycelia and culture filtrates were separated by filtration. The culture filtrates were collected for DON determination, while the mycelia were harvested for biomass analysis. The collected mycelia were washed with sterile distilled water and dried at 60 °C to constant weight. The dry weight of mycelia was recorded and used for normalization of DON production. One mycelial unit was defined as 1 g of dry mycelial biomass. DON concentration in the culture filtrates was quantified using an enzyme-linked immunosorbent assay (ELISA). Briefly, 50 μL of culture filtrate or DON standard solution was added to wells of a 96-well microplate pre-coated with DON antigen, followed by the addition of enzyme conjugate and antibody working solution according to the manufacturer’s instructions. After incubation and washing, color development was achieved using substrate solution and terminated by stop solution. Absorbance was measured at 450 nm using a microplate reader. A standard curve was generated using log<sub>10</sub>-transformed DON concentrations of the standards and the corresponding percentage absorbance values. DON concentrations in the samples were calculated based on the standard curve. Total DON production was calculated according to the culture volume (30 mL) and subsequently normalized to mycelial dry weight. DON production was expressed as μg DON per g dry mycelium. Each treatment group contains three biological replicates and three technical replicates.”

      (2) Another problem was, that the authors considered FgDML1 a regulator of DON production. As mentioned by me and reviewer 3, FgDML1 is crucial to numerous functions in F. graminearum and its lack causes a plethora of problems for fungal physiology. Hence, although it is clear that the lack of FgDML1 causes alterations in DON production, it is not appropriate to designate this factor as a "regulator".

      It seems to me that the authors are afraid that if FgDML1 would not be a "regulator" that this would decrease the value of their study, which is not the case. This is a matter of correct wording. Therefore, please revise the wording accordingly, starting with the title:

      ...FgDML1 impacts DON toxin biosynthesis...

      Moreover, for sure the manuscript might benefit from more detailed description of the whole cascade leading from FgDML1 to DON biosynthesis and production of the other metabolites that change upon deletion. Such explanation can help the reader grasp the relevance of FgDML for regulatory processes as well as on more general versus specific effects.

      Thank you very much for your advice. We fully agree that, given the pleiotropic functions of FgDML1 in F. graminearum and the broad physiological defects caused by its deletion, it is not appropriate to designate FgDML1 as a direct or specific “regulator” of DON biosynthesis.

      We acknowledge that the use of the term “regulator” in the previous version was imprecise. Following the reviewer’s suggestion, we have revised the wording throughout the manuscript to more accurately reflect the role of FgDML1. Specifically, we now describe FgDML1 as a factor that impacts or affects DON toxin biosynthesis rather than directly regulating it. The title has been revised accordingly to read:

      “Mitochondrial protein FgDML1 impacts DON toxin biosynthesis and cyazofamid sensitivity in F. graminearum by affecting mitochondrial homeostasis”

      Importantly, we would like to emphasize that our intention was not to overstate the specificity of FgDML1 in DON regulation, but rather to highlight its influence on secondary metabolism in the context of its broader biological functions. To address this more clearly, we have expanded the Discussion section to provide a more detailed and cautious interpretation of the potential cascade linking FgDML1 deletion to altered DON biosynthesis and changes in other metabolites.

      'Secondary metabolite biosynthesis is generally regarded as an energy-intensive process that is tightly coupled to cellular energy metabolism. ATP serves as the primary energy currency supporting enzymatic reactions, macromolecule synthesis, and subcellular organization required for secondary metabolism. Disruption of ATP generation has been shown to directly impair toxin biosynthesis: for example, silencing of ATP synthase subunit α (AtpA) significantly reduces ATP synthesis and inhibits the production of the TcdA and TcdB toxins(Marreddy et al., 2024). Similarly, in plants, ATP depletion leads to a metabolic shift in which growth and basic physiological processes are prioritized at the expense of energetically costly secondary metabolites, including toxins(Xiao et al., 2024). Together, these findings highlight ATP availability as a key determinant of secondary metabolite production across biological systems.

      In filamentous fungi, mitochondria play a central role in sustaining cellular ATP levels through oxidative phosphorylation and are therefore critical for biosynthetic and stress-adaptive processes. In F. graminearum, mutants defective in mitochondrial components, such as the voltage-dependent anion channel (mitochondrial porin), exhibit aberrant mitochondrial morphology, reduced ATP production, and markedly decreased DON accumulation and virulence (Han et al., 2022). These observations establish a direct link between mitochondrial energy metabolism and secondary metabolite output, supporting the notion that intact mitochondrial function and adequate ATP supply are prerequisites for robust DON production.

      Consistent with this energy-dependent framework, biosynthesis of the mycotoxin DON in F. graminearum requires substantial ATP input. In the present study, ATP content in the ΔFgDML1 mutant was significantly lower than in the wild-type PH-1 and the complemented strain ΔFgDML1-C, and DON production was concomitantly reduced (Fig. 4A). Importantly, DON levels were normalized to mycelial dry weight, indicating that the observed reduction reflects a decreased biosynthetic capacity per unit biomass rather than a secondary consequence of reduced fungal growth. This distinction demonstrates that impaired DON production in the ΔFgDML1 mutant arises primarily from metabolic limitations.

      At the cellular level, ATP depletion compromises multiple energy-dependent steps required for DON biosynthesis. The formation of toxisomes, which are specialized subcellular structures responsible for the spatial organization of DON biosynthetic enzymes, is essential for efficient mycotoxin production and is an ATP-dependent process. Reduced ATP levels disrupt toxisome assembly, and accordingly, the ΔFgDML1 mutant was unable to form functional toxisomes (Fig. 4C). In parallel, western blot analysis revealed a marked reduction in the abundance of the DON biosynthetic enzyme FgTri1 (Fig. 4D). In addition, ATP-dependent processes are directly involved in the biogenesis of the DON biosynthetic machinery: the ATPase activity of myosin I (FgMyo1) is required for efficient translation of key DON biosynthetic enzymes, and disruption of its ATPase function results in reduced DON production(Tang et al., 2018). These findings further underscore the dependence of DON biosynthesis on cellular energy status.

      DON production is also regulated at the transcriptional level by the TRI gene cluster, with Tri5 and Tri6 serving as core components of the biosynthetic pathway. Tri5 encodes trichodiene synthase, which catalyzes the first committed step of DON biosynthesis. In the ΔFgDML1 mutant, expression levels of FgTri5 and FgTri6 were significantly downregulated (Fig. 4B), suggesting that impaired energy metabolism indirectly affects transcription of DON biosynthetic genes. Although no direct regulatory role of DML family proteins in gene expression has been reported in Saccharomyces cerevisiae or Drosophila melanogaster, their established functions in cell division and microtubule organization raise the possibility that FgDML1 indirectly influences gene expression through effects on chromatin organization or cell-cycle progression(Schulze and Wallrath, 2007).

      In addition to reduced ATP levels, deletion of FgDML1 resulted in a significant decrease in acetyl-CoA content (Fig. 5C), a key precursor for trichothecene biosynthesis. Acetyl-CoA links central carbon metabolism with secondary metabolite production, and its depletion further constrains DON biosynthesis by limiting substrate availability. Broader metabolomic studies support this relationship, showing that perturbations in TCA cycle intermediates and central carbon metabolism are closely associated with altered DON production, reinforcing a mechanistic linkage between energy generation and toxin biosynthesis(Atanasova-Penichon et al., 2018).

      “Taken together, these results support a model in which FgDML1 influences DON production indirectly by maintaining mitochondrial energy metabolism. Reduced ATP availability in the ΔFgDML1 mutant restricts energy-dependent biosynthetic processes, disrupts toxisome formation, diminishes DON biosynthetic enzyme abundance and gene expression, and limits precursor supply, ultimately leading to a substantial reduction in DON biosynthesis that is independent of fungal biomass effects.” (in L284-350). In this revised discussion, we explicitly distinguish between general physiological effects caused by the loss of FgDML1 and more specific consequences on secondary metabolic pathways.

      We believe that this revised wording and the expanded mechanistic discussion more accurately reflect the biological role of FgDML1 and improve the conceptual clarity of the manuscript, without overstating its function as a dedicated regulator of DON production.

      Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper in innovative, but there are issues in the writing that need to be added and corrected.

      Comments on revisions:

      The author has addressed my questions.

      We appreciate it very much that you spent much time on my paper and give me good suggestions.

    1. eLife Assessment

      This important study provides convincing data suggesting that subcellular localization of the spatial regulator of cell division, MinD, is an intrinsic feature of the protein's ability to associate with the membrane as both a dimer and a monomer. These findings distinguish the behavior of MinD in B. subtilis from its counterpart in E. coli and suggest that there is not a need to invoke additional localization factors. The reviewers felt that the revisions, particularly the additional experiments and changes to the text to make the experimental design and conclusions clearer, improve the quality of the manuscript and will increase its impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this work the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole to pole oscillation whereby a time average minimum of the Min proteins at mid cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites, and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterization of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations was nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      - In the revised manuscript, the authors now demonstrate localization and tracking data for minC and minJ deletion strains, which suggest that MinJ impacts MinD membrane cycling, but MinC does not. Additional in vitro work showed that the PDZ domain of MinJ modifies MinD ATP hydrolysis rates, and the authors propose that MinJ may promote MinD dimer formation.

      Weaknesses of the revised version: No major weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained.

      Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

      Comments on revisions:

      I'm satisfied with the authors response to my private recommendation points. However, I thought that they would also respond to my points mentioned in Public Review part, weaknesses as shown above and update the revised version accordingly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the molecular dynamics of MinD, a component of the Bacillus subtilis Min system, in vitro and in vivo. In Escherichia coli the Min system is highly dynamic and displays rapid pole-to-pole oscillation whereby a time average minimum of the Min proteins at mid-cell is established. However, in B. subtilis, this is not the case, and there is no MinE present. MinD in B. subtilis dynamically relocalizes from the poles to division sites and binds to MinC and MinJ, which mediates its interaction with DivIVA. This paper reports the biochemical characterization of B. subtilis MinD in vitro and dynamics of MinD variants in vivo, providing mechanistic insight into the mechanism of dynamic localization.

      Strengths:

      In the current study, the authors perform a detailed biochemical characterizion of the in vitro ATPase activity of MinD and demonstrate that rapid hydrolysis is elicited by adding phospholipids. They further show using a collection of substitution mutants of MinD that both monomers and dimers bind to the membrane, and ATP occupancy changes the on and off rates. Identification, quantification, and tracking of discrete Halo-MinD populations were nicely done and showed that mutations in MinD alter dynamic localization, correlating with PL binding on and off rates in vitro.

      Weaknesses:

      While the study shows that MinD in B. subtilis utilizes a different (MinE-independent) activation mechanism, it remains to be determined the extent to which MinJ and/or MinC play a role.

      Reviewer #2 (Public review):

      Summary:

      Feddersen & Bramkamp determined important characteristics of how MinD protein binds/dissociates to/from the membrane, and dimerizes in relation to its ATPase activity. The presented data clearly shows the differences in function of MinD homologs from B. subtilis and E. coli.

      Strengths:

      The work presents well-executed experiments that lead to interesting conclusions and a new model of how Min system works during B. subtilis mid-cell division. Importantly, this model is supported by in vitro characterization of well-chosen mutants in the functional domains of MinD. Outstandingly, most of the in vitro data are confirmed by single-molecule localization microscopy.

      Weaknesses:

      The authors immobilized liposomes, for which they used E. coli total lipids, to measure ATPase activity and liposome association and dissociation of B. subtilis MinD. For these experiments would be more suitable to use B. subtilis total lipids as more biologically relevant data could be gained. Although the work is in detail and nicely compares the function of B. subtilis Min system with E. coli Min system, it lacks the comparison of the Min system function in other rod-shaped Gram-positive bacteria. I would suggest including in the Discussion the complexity of other Min systems. Especially, this complexity is seen in other rod-shaped and spore formers such as Clostridial species in which one of these Min systems or both are present, an oscillating E. coli Min system type and more static as in B. subtilis.

      Reviewer #3 (Public review):

      Experimentally, this study provides sufficient data to support the authors' conclusion that MinD dimerization but not ATPase activity is both necessary and sufficient for concentrating it and its binding partner, the division inhibitor MinC, at cell poles. Biochemical data appears to be rigorously acquired and includes proper controls. Although cytological data are consistent with the authors' model, quantitative information on MinD localization in a statistically relevant set of cells is missing (e.g. Figure 2B).

      The study's other major conclusion, as outlined in their discussion, that a reaction-diffusion model explains MinD localization in wild-type cells, is unsubstantiated. If they would like to make this a major conclusion of the final manuscript, they will need to include modeling that takes into account biochemical and cytological data. From a presentation perspective, the manuscript is challenging to read and will require substantial rewriting and revision prior to publication.

      We thank the reviewers for their detailed and constructive comments on our work. We particularly acknowledge that the initial version of our manuscript was difficult to read and might have provoked the impression that the aim was to formulate a new mathematical model of Min dynamics in B. subtilis. However, our work aimed at providing solid (and first) biochemical evidence for the MinD ATPase cycle and the nature of the ATPase stimulation. Furthermore, we aimed at corroborating the in vitro findings with single-molecule microscopy data that provided a detailed in vivo picture of the Min dynamics in living cells. Together, this work combines for the first time in vitro and single-molecule in vivo data. During the revision, we generated a wealth of new data that aimed at unraveling the potential effects of MinC and MinJ on MinD dynamics. A major problem during the revision was the problematic purification of MinJ. The membrane integral MinJ has been shown to be highly susceptible to proteolytic decay during purification attempts. Despite various attempts we did not succeed in the purification of full length MinJ. These efforts also led to the unusual long revision time. We therefore turned to the purification of the soluble part of MinJ, namely the PDZ domain. The revised work now contains in vitro data showing the impact of MinC and MinJ-PDZ on MinD ATPase activity and membrane binding. Furthermore, we now provide single-molecule tracking data of MinD in minC and minJ deletion mutant backgrounds. Importantly, the new data show that MinC has no effect on MinD activities, while the PDZ domain has a mild stimulating effect on MinD´s ATPase activity. In summary, a detailed picture on how MinD dynamics function mechanistically in B. subtills emerges.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is important to evaluate MinD ATPase activity, PL binding, and release in the presence of MinC and MinJ. In E. coli, MinD recruits MinC to phospholipids. The presence of MinC could change the on/off rates. It is unknown if MinC or MinJ could alter the ATPase rates or dynamics. Presuming that MinD alone drives the complete dynamic story because stimulation is observed in vitro with phospholipids, it follows that Michaelis Menten kinetics is insufficient. It is acknowledged that MinJ is difficult to purify, but one could test a small cytoplasmic subdomain or MinJ-enriched membranes for MinD recruitment and release.

      Indeed, it is unknown whether MinC or MinJ have an impact on the ATPase rates or protein dynamics of MinD in B. subtilis. To address the potential influence of MinC and MinJ on MinD’s ATPase activity and dynamics, we conducted a series of experiments. MinC was successfully purified, and subsequent BLI and ATPase assays revealed no significant impact on MinD activity in our system, except for a modestly reduced ATPase activity (Figure S 5).

      With regard to MinJ, multiple constructs and purification strategies were attempted. While full-length MinJ could not be purified, we isolated the C-terminal PDZ domain to probe potential interactions. In ATPase assays, the PDZ domain reproducibly increased MinD ATP hydrolysis rates, whereas BLI measurements did not reveal detectable changes in MinD membrane-binding kinetics under these conditions. We agree with the reviewer that membrane-integrated MinJ could exert additional effects on MinD recruitment or release that are not captured by the isolated PDZ domain, and we now discuss this limitation in the revised Discussion.

      Furthermore, we performed single-molecule localization and tracking analyses of MinD in ∆minC and ∆minJ backgrounds. These experiments, found in a newly added Results section and summarized in Fig. S 12, demonstrate that MinJ appears to play a role in maintaining dynamic MinD membrane cycling and preventing excessive confinement or aggregation, whereas MinC has no obvious effect on MinD dynamics.

      (2) It is important to show the reduced ATP hydrolysis by MinD mutant proteins (line 243). Stating that they are catalytically inactive without showing the data is presumptuous, and there may be differences between the mutants. Although I am sure that the authors evaluated activity with phospholipids, it should be shown.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      (3) The shoulder on MinD-K16A suggests that it is capable of forming a dimer at low equilibrium. The suggestion that it is due to interaction with the inert SEC matrix (line 242) raises more concerns, although this is highly unlikely, given that G12V elutes as a single peak. The possibility of a dimer here also demonstrates the necessity of reporting precise ATPase rates for the mutants.

      Thank you for this comment. Since we shared some of your concerns, we made sure to gather enough evidence before making the respective claims. We conducted both in vivo (single-particle tracking, widefield microscopy) experiments and in vitro experiments with the respective K16A mutant of MinD. Most convincingly, K16A is completely catalytically inactive (see previous answer), while both positive and negative controls behave as expected. Both in vivo and in vitro experiments suggest that the protein still binds membrane despite not being able to form dimers. Similar observations were made in a study conducted by colleagues in parallel (Bohorquez et. al, 2024). Furthermore, K16A exchanges in other Walker motif-containing proteins, including E. coli MinD and RecA, and B. subtilis ParA/Soj, abolish dimer formation completely.

      There are many possible explanations why the observed shoulder during elution could appear, which we did not spell out in the results section. This includes possible conformational heterogeneity, as the protein may adopt multiple stable or semi-stable conformations that slightly differ in hydrodynamic volume. It is also possible, that the shoulder represents small protein aggregates from degradation products or proteolysis, which we indeed observe in the respective SDS-PAGE/Blot (Fig. S6). As written in the text, interactions with the SEC column through e.g. hydrophobic patches sticking out is not uncommon, as the surface charges of the mutant protein is different to the wild type version. On the same note, the buffer may subtly affect the surface properties like charge and hydrophobicity differently to the wild type protein and thus its interaction with the column. In conclusion, we are confident that the orthogonal methods used point towards dimer abolishment in a K16A mutant of MinD, despite displaying a small shoulder during SEC elution.

      (4) BLI data - were the kon and koff rates also determined without ATP, since it is assumed that MinD-K16A does not bind ATP, but has a strong Kd (Table 1). Does ATP modify Kd of wt MinD for PLs?

      Without ATP, MinD did neither properly interact with the sensor-bound liposomes nor follow a regular binding kinetic. Therefore, kinetic constants could not be determined, as the fitting of the curves is not possible. In addition to the respective figure (Fig. S8), we attached the graph of the raw/unfitted data in the supplement (Fig. S 13)- (MinD2 dataset)).

      (5) Local MinJ interactions are proposed to alter the dynamic localization of MinD wt and variants in vivo (line 349-358), which could occur through regulation of ATP hydrolysis, PL binding, or release by MinJ or MinC. Localization dynamics should be measured in minC and minJ mutant strains.

      We thank the reviewer for this important suggestion. In response, we have now directly measured MinD localization dynamics in both ∆minC and ∆minJ backgrounds. We performed single-molecule localization microscopy (SMLM) and single-molecule tracking (SMT) of Halo-MinD expressed from its native locus in these mutant strains, using the same experimental and analytical pipeline applied throughout the study. These new experiments are presented in a newly added Results section and summarized in Figure S12, where we quantitatively compare MinD localization, mobility, diffusion states, and confinement between wild type, ∆minC and ∆minJ cells. The data show that deletion of minJ leads to a pronounced increase in the confined/static MinD fraction and reduced dynamic cycling, whereas deletion of minC causes only subtle changes in MinD dynamics. These findings support a specific role of MinJ in maintaining dynamic MinD membrane cycling in vivo, while MinC has a more modest modulatory effect. We have integrated these results into the Discussion to refine our model of how MinJ and MinC differentially influence MinD dynamics and localization.

      (6) Considering the single molecule population counting and a lack of error presented for the binning of tracks (confined/slow/fast); it is difficult to rationalize why G12V and K16A are defective. The relative proportions of confined/slow/fast between wt, G12V, and K16A seem quite similar (i.e., bubble plot). And the static localization in Fig. 2B does not seem dramatically perturbed. This seems to invoke other cellular regulators as critical for the system's operation in the cell, further pointing to important regulatory roles by MinJ and/or MinC.

      First, regarding the apparent lack of error estimates for the population binning, the uncertainties associated with the SMT-based population fitting are intrinsically very small and fall below the graphical resolution of the plots. This reflects the large number of tracks analyzed and the robustness of the fitting procedure, rather than an omission of error analysis.

      Second, we respectfully disagree that the diffusion-state distributions and static localization patterns of G12V and K16A are similar to those of the wild type. In the context of SMT data, the observed shifts in population sizes are substantial and biologically meaningful. Moreover, the static localization of these mutants is markedly altered: instead of forming a graded enrichment at poles and septa, both mutants display a uniform membrane distribution, similar to e.g. a membrane stain (also see Fig. 2 B). This indicates a loss of regulated recruitment, consistent with impaired interaction with MinJ. Importantly, our biochemical analyses, together with extensive data on conserved Walker-type ATPases carrying analogous G12V and K16A mutations, strongly support the conclusion that these variants are functionally defective despite retaining membrane association.

      Third, we agree about the importance of MinC and MinJ, and have now directly tested the contribution of these interactors by analyzing MinD dynamics in ∆minC and ∆minJ backgrounds. These new data, presented in a newly added Results section and summarized in Fig. S12, support our interpretation by showing that MinJ has a pronounced effect on MinD confinement and dynamic cycling in vivo, whereas MinC has a more modest influence. Together, these findings reinforce the conclusion that the defects of G12V and K16A arise from impaired regulatory cycling through the mutations, but also through impaired interaction with MinJ.

      (7) Interesting that they stored the His-MinD protein at 4C for up to one week and not at -80C as it was in 10% glycerol. Was MinD inactivated by freezing? Did this contribute to the observed aggregation (line 695)?

      We thank the reviewer for raising this point. Prior to this comment, we routinely worked with freshly purified MinD and therefore had not systematically compared storage at 4 °C and -80 °C. In response to the suggestion, we have now directly compared the activity of MinD stored at 4 °C for one week with that of MinD stored at -80 °C for four weeks. We did not observe any significant difference in ATPase activity or overall biochemical behavior between the two storage conditions. These results indicate that freezing does not inactivate MinD and that the aggregation observed in some preparations is unlikely to be caused by storage at 4 °C. We have clarified this point in the materials and methods part of the manuscript and thank the reviewer for prompting this.

      (8) Line 109 - Type. Change "component" to "components".

      (9) Page 4, line 52 change 'machinery' to ‘machine'.

      (10) Page 13, line 248, changed 'manifested' to 'displayed'.

      Thank you for pointing out these typos, which have all been corrected.

      Reviewer #2 (Recommendations for the authors):

      I suggest making changes to sentence Lines 60-62: "In rod-shaped model bacteria like Escherichia coli and Bacillus subtilis, division site selection is governed by two protein systems (15-17): nucleoid occlusion and the Min system." However, it was shown previously that the deletion of both systems in B. subtilis, division site selection wasn't disturbed and other mechanism was suggested to be involved.

      We agree that this information should be part of the introduction. Therefore, we included the following sentence at the indicated position:

      “However, it was previously shown that simultaneous deletion of both systems in B. subtilis did not disturb division site selection, suggesting additional mechanisms to be involved (Rodrigues and Harry, 2012).”

      I suggest changing sentence Lines 85-86: "Dimerized MinD recruits MinC and activates it to prevent FtsZ dynamics (46)". It would be more precise to say: "Dimerized MinD recruits MinC and activates it to inhibit FtsZ oligomerization (46).

      Thank you, we agree and changed the sentence accordingly.

      In Figure S2 mark the two mentioned peaks 31 and 62 kDa to which elution volumes correspond.

      We thank the reviewer for this point. We ran the standards for this column again and fitted them to our peaks (see updated Fig. S2), now demonstrating that the shoulders are indeed not at a size where dimers would elute but rather around ~44.3 kDa. We note that both the Ni-NTA eluate and SEC fractions contain multiple His-tagged degradation products (see revised Fig. S2 and His-MinD blot in Fig. S1). Because the SEC run was performed with excess ADP to suppress ATP-dependent dimerization, we interpret the minor shoulder at ~44.3 kDa as arising from sample heterogeneity due to these degradation products, either by co-elution of fragments or by transient fragment:full-length MinD assemblies, rather than full-length MinD dimerization. This is now also described in the respective Results section.

      Reviewer #3 (Recommendations for the authors):

      The quality of the written manuscript is poor, making it difficult to read and appreciate. Specifically: The introduction is quite long. It takes almost three pages until the primary objective of the paper, identifying determinants of MinD localization in B. subtilis, is clearly stated. The introduction should be shortened to focus specifically on Min system function across species-i.e. prevent aberrant polar septation events. Three or four paragraphs should be sufficient. E.g. 1. Introduction to Min systems generally, 2. A summary of the mechanism underlying MinD oscillation in E. coli, 3. An explanation of similarities and differences between E. coli and B. subtilis, and 4. A paragraph outlining the specific questions to be addressed in this study.

      We have substantially revised the Introduction to address this concern. The revised version is considerably shorter and more focused, and now follows the structure proposed by the reviewer. As a result, the main objective of the manuscript is now stated much earlier, and the overall readability and clarity of the Introduction have been substantially improved.

      The results section is challenging to read, in part due to the inclusion of methods as well as some issues with organization. For example, this section begins with a single sentence describing the need to investigate MinD's ATPase cycle in vitro. This sentence is followed by a header and an entirely new section describing the methods used to purify MinD for biochemical analysis. These details should be in the methods section. Similarly, the first paragraph of the following section, which focuses on the ATPase activity MinD in the presence and absence of liposomes, describes how the commercially available EnzChek phosphate assays works. This is, again, something that belongs in methods, not results.

      We have revised the Results section extensively in response to this comment. In the revised manuscript, we have removed or relocated substantial methodological detail from the Results to the Methods section and streamlined the overall organization. Descriptions of protein purification procedures and standard assay principles, including details of the EnzChek phosphate assay, have been condensed or moved to the Methods where appropriate.

      At the same time, we have retained limited methodological information in the Results where it is essential for understanding the interpretation of non-standard experimental setups or key controls, like SMLM. In these cases, brief methodological context is provided to ensure clarity without requiring frequent cross-referencing to the Methods section.

      Overall, the Results section has been substantially condensed and reorganized to improve readability, while additional experiments added in response to reviewer comments necessarily increase the scope of the section. We believe the revised structure now clearly separates experimental outcomes from methodological detail and improves the flow of the Results.

      The discussion section, at 7 pages, is overly long and includes substantial extraneous information. For example, it begins with a 2.5 page long paragraph that includes a summary of pattern formation during embryogenesis in animals, followed by a brief description of Turing's reaction-diffusion model, and finally, repeating parts of the introduction, a summary of the mechanism underlying MinCDE localization in E. coli. It is only in the middle of this paragraph - near the end of the second page - that the authors turn their attention back to MinD localization in B. subtilis, albeit with a focus on reaction-diffusion-based behaviors of other ParA homologues. A revised discussion section should focus on the primary conclusion of the authors, based on data presented in the results. If the authors would like to make the case that their data fit the Turing reaction-diffusion model, they will need to include mathematically based modeling that demonstrates this point in their results.

      We have substantially revised and condensed the Discussion in response to this comment. In the revised manuscript, we removed the extended introductory material on general pattern formation, embryogenesis, and Turing reaction-diffusion theory, as these topics extended beyond the scope of the present study. We also eliminated redundant summaries of the E. coli MinCDE system that overlapped with the Introduction. The revised Discussion now focuses on the primary conclusions supported by our experimental data, namely the biochemical and in vivo mechanisms governing MinD membrane binding, ATPase activity, and dynamic localization in B. subtilis, as well as the regulatory roles of MinJ and MinC. Importantly, we would like to clarify that we did not intend to claim that the B. subtilis Min system follows a Turing-type reaction-diffusion mechanism. References to general reaction-diffusion concepts were meant to provide contextual background and not to imply a specific mathematical framework for the system studied here. To avoid any possible ambiguity, we have removed these references from the Discussion.

      While the overall length of the Discussion is now comparable to the previous version, this reflects the inclusion of substantial new experimental data added during revision. Importantly, the structure and content of the Discussion have been streamlined to prioritize interpretation of the results rather than general background, resulting in a more focused and cohesive narrative.

      Experimental comments:

      Line 213: Please provide a rationale for the ATPase experiments. What is the expected result for each mutant and why?

      We have clarified the rationale for the ATPase experiments in the revised manuscript by briefly outlining the expected behavior of each MinD mutant. The anticipated ATPase properties of G12V, K16A, and D40A are based on well-established studies of conserved Walker-type ATPases and were implicit in the original experimental design, as they should all be catalytically inactive. To avoid any ambiguity, we now state these expectations explicitly in the manuscript.

      Line 243: ATPase data for the mutant proteins should be included in the supplement.

      We have now quantified the ATPase activity for all MinD mutants from the respective EnzChek assay data. These experiments confirm that the G12V, K16A, and D40A mutations effectively abolish catalytic activity, yielding phosphate release rates that are essentially at the background detection limit of the assay. We have included these data in Figure S 7 C and updated the text to reflect these findings.

      Figure 2B: Please include transverse section fluorescence data for all variants as well as quantitative data on average MinD positioning.

      The quantitative information requested is already provided by our single-molecule localization and tracking (SMLM/SMT) analysis of Halo-MinD and its variants (Fig. 4 A and now S 12 A). This approach represents the averaged spatial distribution of individual MinD localizations collected from dozens of cells per condition and provides substantially higher spatial resolution and quantitative precision than transverse fluorescence profiles obtained by conventional widefield microscopy.

      We therefore believe that the SMLM-based analysis is superior to transverse section fluorescence measurements and more accurately captures average MinD positioning across the cell population. To avoid redundancy, we have retained the SMLM analysis as the quantitative framework for MinD localization.

      Figure 2B: I am not convinced that punctate and membrane-associated are mutually exclusive. Quantitative data on protein localization from transverse fluorescent sections is necessary to make this point.

      Please see the answer above and Fig. 4 A

      Figure 2B: It is impossible to assess the functionality of individual mutants without quantitative data on minicell frequency and cell length.

      We have addressed this point by quantitatively measuring both cell length and minicell frequency for all relevant strains. These analyses were performed on a minimum of n = 430 cells per strain and are now presented in Table S 5. The added data provide a quantitative assessment of mutant functionality and support the phenotypic interpretations shown in Fig. 2B, and is also integrated in the Results section.

      Other comments:

      Line 109: should read "components".

      Thank you, corrected.

      Line 135: Why is this sentence outside the major section of the results?

      It now has been integrated into the major section.

      Line 197: I am not sure I understand this sentence.

      We have revised this sentence to improve clarity and readability.

      Line 218: I do not understand this paragraph.

      We have also rephrased and rewritten this paragraph for clarity and readability.

      Line 223: To make this section focused on the results rather than the method, the authors could simply say "To determine the role of ATP mediated dimerization, we...." (If I am understanding this section correctly).

      We followed this suggestion and revised the text accordingly to focus on the experimental outcome rather than methodological detail.

      Line 273: "depicted" not depictured.

      Thank you, corrected.

      Figure 4: The single-cell data look good in the figure, however, the description of these results and their meaning are nearly impossible to follow in the text.

      We acknowledge that the single-molecule data presented in Fig. 4 are complex. While we have made minor clarifications to improve the flow and wording of the text, we did not substantially reduce the level of detail, as the description of the analytical framework is required for correct interpretation of the results.

      At the same time, we aimed to avoid repeating extensive methodological explanations that are already described in the Materials and Methods section, in line with other reviewer comments. We therefore retained a concise but technically accurate description in the Results to ensure that the biological conclusions drawn from Fig. 4 can be properly understood.

    1. eLife Assessment

      This study provides important insights into how immune cells in the brain's protective layers behave under normal and disease-like conditions, revealing location-specific activity patterns that may shape inflammation and disorders such as migraine. The evidence is compelling and supported by advanced imaging approaches and rigorous analyses, although some conceptual and interpretational limitations temper the mechanistic depth. Overall, the work will be of broad interest and represents an invaluable contribution to the growing field linking immune and nervous system function.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a technically sophisticated intravital two-photon calcium imaging approach to characterize meningeal macrophage Ca<sup>2+</sup> dynamics in awake mice. The development of a Pf4Cre:GCaMP6s reporter line and the integration of event-based Ca<sup>2+</sup> analysis represent clear methodological strengths. The findings reveal niche-specific Ca<sup>2+</sup> signaling patterns and heterogeneous macrophage responses to cortical spreading depolarization (CSD), with potential relevance to migraine and neuroinflammatory conditions. Despite these strengths, several conceptual, technical, and interpretational issues limit the impact and mechanistic depth of the study. Addressing the points below would substantially strengthen the manuscript.

      Strengths:

      The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Weaknesses:

      The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation).

      The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

      Comments on revisions:

      The authors have answered the questions well.

    3. Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within meninges in vivo. The paper is well written and clearly presented.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this report wish to show that distinct populations of meningeal macrophages respond to cortical spreading depolarization (CSD) via unique calcium activity patterns depending on their location in the meningeal sub compartments. Perivascular macrophages display calcium signaling properties that are sometimes in opposition to non-perivascular macrophages. Many of the meningeal macrophages also displayed synchronous activity at variable distances from one another. Other macrophages were found to display calcium signals in response to dural vasomotion. CSD could induce variable calcium responses in both perivascular and non-perivascular macrophages in the meninges in part due to RAMP1 dependent effects. Results will inform future research on the calcium responses displayed by macrophages in the meninges under both normal and pathological conditions.

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses are also noted in relation to CSD events.

      Weaknesses:

      Specificity of the methods used to target both meningeal macrophages and RAMP1 are limited. A discussion section on potential pitfalls is included to address this.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses:

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, PF4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the PF4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, in the revised version, we discuss a potential limitation of the Pf4Cre-based labeling approach for studying meningeal macrophages’ Ca<sup>2+</sup> signaling, namely that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In the revised discussion part, we acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation).

      In the results section, we indicate that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling regulates dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      Thank you for this suggestion. In the revision, we indicate that further studies are required to resolve the exact source of synchrony.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      While we propose that the decrease in macrophage Ca<sup>2+</sup> signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised discussion, we indicate that further studies are needed to determine whether this reduction in meningeal macrophage Ca<sup>2+</sup> activity reflects altered viability or reduced immune function that could interfere with the macrophage’s ability to restore homeostasis and dampen local inflammation.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

      Although n=3 is common in intravital imaging of the meninges, including experiments employing pharmacological manipulations, such as RAMP1 inhibition (5-7), a larger sample size will increase confidence in the results. We further acknowledge that our pharmacological data indicate only a potential role for RAMP1 signaling in meningeal macrophages and that CGRP/RAMP1 signaling in other meningeal immune or vascular cells may also play a role.

      Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments.

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text.

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (8). We now provide this information in the revised manuscript, including their labeling approach with a dextran tracer.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We have added more background and the method for inducing CSD (i.e., a pinprick in the frontal cortex) in the Results section.

      Reviewer #3 (Public review):

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included.

      Please see previous responses regarding the specificity of the PF4Cre line for targeting macrophages. The specificity of the RAMP1 antagonist we used (BIBN4096, Olcegepant) has been confirmed by its developer Boehringer Ingelheim, and has been used to target CGRP signaling in numerous studies, including those targeting meningeal macrophage and vascular signaling (2, 7). A section on the study’s limitations has been added.

      References:

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9, (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv, (2025).

      (5) T. L. Roth et al., Transcranial amelioration of inflammation and cell death after brain injury. Nature 505, 223-228 (2014).

      (6) M. V. Russo, L. L. Latour, D. B. McGavern, Distinct myeloid cell subsets promote meningeal remodeling and vascular repair after mild traumatic brain injury. Nat Immunol 19, 442-452 (2018).

      (7) K. L. Monaghan et al., Highly dynamic dural sinuses support meningeal immunity. Nature, (2026).

      (8) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221, (2024).

    1. eLife Assessment

      This valuable study shows that locomotion-related modulations in the mouse visual cortex are not uniform but primarily affect neurons in muscarinic receptor-negative patches, which receive projections from specific cortical areas. While the evidence is mostly solid, some uncertainties remain regarding the link between anatomical data and functional measurements. The study should be of interest to neuroscientists interested in state modulation of cortical function.

    2. Reviewer #1 (Public review):

      Processing in the primary visual cortex (V1) of mice is not only based on sensory inputs but also strongly modulated by locomotion. In this study, Meier et al. ask whether neurons that are modulated by locomotion form clusters in V1. Their work is based on previous studies from their lab establishing a modularity in the organization of primary visual cortex based on M2-muscarinic-acetylcholine-receptor-positive patches and interpatches (Ji et al. 2015, D'Souza et al. 2019). In these studies, they have highlighted the clustering of specific visual pathways and inhibition. In the current study, they extend this modularity to motor inputs, confirming a clustering of locomotion modulated neurons but also show that these clusters overlap with the M2-negative interpatches of layer 1. Finally, they establish a blueprint for visual processing streams in V1, segregating projections to and from lateral visual areas (LM, AL, and RL) from projections to and from the lateral areas, including the visual area PM, the retrosplenial cortex (RSP), and the secondary motor area (MOs).

      Conceptually, this study provides an important finding in the organization of locomotion-related signaling in primary visual cortex, which clearly has substantial implications for sensory processing in visual cortex. While the anatomical data are solid, the link to physiology is incomplete. In conclusion, there are numerous issues that leave the main findings in some doubt, so the authors have some work to do before I find this story convincing.

      Major issues:

      (1) The major results in this study rely on proper quantification of neuronal responses during resting and running. Recently, it has been reported that hemodynamic occlusion can strongly influence measurements of fluorescent changes using two-photon imaging (Yogesh et al. 2025, doi.org/10.1101/2024.10.29.620650). Since it is unclear whether there is an inherent bias in vasculature and hemodynamic occlusion in M2 patches and interpatches, a quantification of the effect of hemodynamic occlusion would be necessary. This control would ideally be done using mice with GFP expression to test if there is still a clustering of locomotion-modulated neurons that overlaps with M2-negative interpatches. Alternatively, the authors should at the very least quantify the vascularization in M2 patches and interpatches.

      (2) To assess the effects, the authors use a correlation analysis for many of their findings (e.g., Figures 2b,c, 4j,k, ...). This, however, is inappropriate to assess the significance of the results. I suggest redoing all statistics with hierarchical bootstrap sampling (Saravanan et al. 2020, PMID: 33644783) or similar.

      (3) The authors use two different measures to assess whether and to what extent a neuron is locomotion sensitive, the LMI and "locomotion-responsive". While the LMI is defined based on recording in the light and dark (Figure 2), the "locomotion-responsiveness" is defined only in the dark (Figure 3a,c,d). The link between the two measures should be clarified.

      a) Additionally, Figure 2b shows higher average LMI for interpatches, but the locomotion-responsive fraction is similar in interpatches and patches (relative number of pairs in Figure 3c and Figure 3d). How do the authors explain this discrepancy?

      b) How is the LMI calculated - based on the average or the maximum response over stimuli? One particular stimulus? If the LMI is defined for each stimulus separately, what is plotted in Figure 2b?

      (4) In the last panels of Figures 4-7, the authors analyze the alignment of cell bodies with the M2 patches. While in superficial layers it might be straightforward to align the cell body locations with the M2 patches and interpatches in layer 1, this alignment does not appear to be trivial for deeper layers. The authors should provide additional material to convince the reader of the proper alignment.

      (5) Related to point 4 above - Given the importance of a proper alignment of M2 patches with the in vivo imaging, the in vivo - ex vivo alignment should be more convincing than Figure 1 C-E. Measuring M2 patches in vivo (as the authors have tried to do) would have provided more solid evidence. Have the authors tried to remove the dura for their in vivo imaging to increase signal-to-noise? In any case, more examples of proper alignment are necessary.

      (6) The authors state that locomotion selectively affects M2-/M2- pairs based on Figure 3c. However, to make this claim, there should be a significant difference between the correlation of stimulus-driven noise of M2-/M2- locomotion-responsive pairs and M2-/M2- locomotion-unresponsive pairs, AND no significant difference in the same analysis for M2+/M2+ pairs (i.e., testing the differences between the bars in Figure 3c and Figure 3d).

    3. Reviewer #2 (Public review):

      Summary:

      Meier et al. explore the variability of locomotion-related modulations in mouse area V1. They present 4 major findings: V1 L2/3 neurons beneath M2- interpatches are more strongly locomotion-modulated than those beneath M2+ patches, while V1 L2/3 neurons are more strongly orientation tuned. They then use viral tracing to examine the relationship of M2- interpatches and M2+ patches with inputs from and outputs to HVOs, MO, RSP, and LP, and find evidence for different closed-loop subnetworks within L1; these relationships, however, are more complicated for cell bodies in L2/3. Finally, they also describe an overlap between M2- interpatches and SOM+ dendrites/axons.

      Strengths:

      The strength of the manuscript is the detailed anatomical quantification of closed-loop connectivity, and the description of the organizing principles of M2- interpatches and M2+ patches.

      Weaknesses:

      The major weakness of the manuscript is the lack of a direct connection between the functional and the anatomical data, and the somewhat puzzling effects observed in the analysis of noise correlations. The former issue might be alleviated by modelling, where the authors could explore the space of possibilities that could explain the functional data based on the anatomical connectivity. Some control analyses could be done, for the comparison of noise correlations.

    4. Reviewer #3 (Public review):

      The authors build on the large body of their previous research, which showed that the mouse primary visual cortex is organised into two types of clusters, M2+ and M2-, which exhibit distinct input patterns from thalamus and higher visual cortical areas and distinct visual tuning preferences. The current study reveals that a like-to-like projection from within-cluster neurons to the areas that provide feedback projections and, furthermore, that neurons in the M2- clusters are more strongly affected by non-visual signals about the locomotion of the animal.

      The study adds fundamental insights to our understanding of the principles of cortical organisation and computation, specifically how the cortex integrates sensory and action-related signals.

      While the tracing data are very convincing, data analysis should be strengthened to support the claims:

      (1) The locomotion modulation index (LMI) compares the mean activity during running and not running but does not seem to account for differences between visual stimuli, so that the LMI could be influenced by the neuron's visual tuning rather than its sensitivity to locomotion, e.g. if the mouse was running more when the neuron's preferred stimulus was presented. Trials should first be averaged per stimulus, and then across stimuli. Alternatively, only the preferred stimulus could be considered.

      The significance test (unpaired t-test) suffers from the same flaw. Instead an ANOVA (with stimulus parameter as factor) would resolve the problem, or testing whether fitting the data with two tuning curves (one per locomotion state) or a single curve results in a lower error (using cross-validation).

      Given that there is evidence that specific visual stimuli can induce more or less running in mice, this issue is very important to account for behavioural differences across stimuli.

      (2) All bars in Figure 2b show a lower LMI than the reported mean LMI of 0.19. This should be checked.

      (3) Correlation tests: Pearson correlation is only meaningful when applied to continuous data. A more suitable test for discrete data like the M2 patch quantile is a rank test like Kendall's coefficient of rank correlation. This applies to data in Figure 2b,c, 4j,k, Figure 2 - Supplement 2,1a, etc.

      (4) How OSI was determined should be clarified. Specifically, were R_pref and R_ortho the mean responses to the two opposite movement directions? Similarly, how was the half-width at half-maximum of orientation determined? From the fits in Figure 2a, it looks like the widths of both Gaussians can be different.

      (5) The correlation measures in Figure 3 would greatly benefit from additional analyses to help interpretation of the results.

      a) Correlations between neurons typically increase with increasing firing rates (e.g., de la Rocha J, Doiron B, Shea-Brown E, Josić K, Reyes A. 2007. Correlation between neural spike trains increases with firing rate. Nature 448:802-6. doi:10.1038/nature06028). Could the higher correlations in M2+ pairs (Figure 3a) be explained by higher firing rates in M2+ compared to M2- neurons?

      b) To determine correlations in Figure 3a, trials during locomotion and stationarity were pooled. As locomotion impacts the firing rate of the neurons, it would be helpful to separate correlations between the two states, locomotion vs stationarity, so the measures reflect something closer to "noise correlations" rather than tuning to locomotion.

      c) Similarly, in Figure 3b, I wonder whether the large correlations in M2- pairs are driven by locomotion rather than functional connectivity. As suggested in b, a better test of noise correlations would be to account for locomotion, i.e., separate trials by stimulus identity and locomotion state. To prevent conditions with few trials from having greater weight in the overall noise correlations, I suggest the authors first z-score responses per condition, then determine noise correlations across all trials (as explained in Renart et al., 2010).

      d) Correlations in Figure 3a,b should be tested with an ANOVA and a control for multiple tests.

      (6) In plots like Figure 4j-l, it would be very informative to show individual measures (per ROI and mouse) in addition to mean +- SEM. As the counts are low (<10) it wouldn't obstruct the plot.

      (7) The caption of Figure 4l says that most retrogradely labelled cells are located in L2/3. However, the plot only shows data from L2/3 and a single section of L4, so one cannot compare it to other layers. Can the authors corroborate the claim with data from other layers?

      (8) Methods:<br /> The authors should provide more details on the visual stimuli: What was the background on which gratings were presented? How long was the inter-stimulus interval? What was presented during the inter-stimulus interval? How large were gratings used to map tuning to SF, TF, and orientation?

    5. Author response:

      In the review, the critique was focused mainly on the functional results, which show that interpatch neurons in mouse V1 are more strongly modulated by locomotion than patch neurons. The anatomical results that patch and interpatch modules are recurrently connected in three interareal subnetworks were considered solid.

      We acknowledge the limitations of our work. Specifically, the number of recorded neurons could be higher, the mapping of neurons onto to patch and interpatch modules could be more direct, and the asymmetric distribution of locomotion-modulated responses in layer 2/3 may be confounded by selective masking of GCaMP signals by surface blood vessels. In experiments which are not included in the manuscript we have found no systematic spatial relationship between the M2AChR pattern and the vascular marker CD31, ruling out that masking contributed to the imaging results. Unfortunately, we are unable to revise the manuscript to the extent recommended by the reviewers because the collaborators have left the lab, which closed in 2024.

    1. eLife Assessment

      The authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in non-small cell lung cancer, proposing that resistance arises from signaling rewiring rather than additional mutations. While the study addresses a valuable clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation, meaning the strength of evidence is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models-including cell lines, PDXs, CDXs, and PDXOs-they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Several key conclusions are not entirely supported by the data. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance.

      Comments on revised version:

      The authors have addressed some but not all of my concerns and suggestions. The authors do acknowledge some of the limitations. It would be useful to include a limitations paragraph in the Discussion.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the reviewed manuscript is still very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Some changes suggested by reviewer 1 and this reviewer have been made to the text, including changes to text and figures, including quantification of some blots. But for most of it, this version is very similar to the first submission and many of the weaknesses and suggestions I made remain the same.

      Strengths:

      - One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs and the fact that the authors have different clones for each, made this collection especially relevant as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      - Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      - The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells

      - The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      - The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      - The revised manuscript includes the information for the whole exosome sequence, making the finding clearer for the reader.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      - The abstract is mainly the same, and the authors only indicate that they will update it.

      - The tables with the proteomics data are still not included, and again, there is only a comment from the authors that it will be made available. Thus, the way the data is presented in Figure 3 still does not allow the reader to get an idea of many of the findings from this experiment.

      - In Figure 3, the authors indicate that the raw data will be included in the revised version, which should improve the understanding of the reader, but this is not included yet. As in the previous version, the MS-based Phosphoproteome is still not really presented in the current manuscript.

      - The authors still do not specify where the proteomics data will be deposited, and whether it will be made public to comply with FAIR principles. They indicate that they will comply with the journal requests, but it is still not clear what will be deposited.

      - The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and

      - The authors do not address the important point made in the previous review about the effect of copanlisib in parental cells. I might not have been clear, so the data in Figure 4D-F seem to support that PI3K treatment of parental cells is as effective as in the resistant cells. Therefore, it is not clear whether the effect shown in the resistant cells is related to the acquisition of resistance to sotorasib or if these cells are simply sensitive to the drug because the parental cells were already sensitive.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate mechanisms of acquired resistance (AR) to KRAS-G12C inhibitors (sotorasib) in NSCLC, proposing that resistance arises from signaling rewiring rather than additional mutations.

      Strengths:

      Using a panel of AR models - including cell lines, PDXs, CDXs, and PDXOs - they report activation of KRAS and PI3K/AKT/mTOR pathways, with elevated PI3K levels. Pharmacologic inhibition or CRISPR-Cas9 knockout of PI3K partially restores sotorasib sensitivity, and p-4EBP1 upregulation is implicated as an additional contributor, with dual mTORC1/2 inhibition more effective than mTORC1 inhibition alone.

      Weaknesses:

      While the study addresses an important clinical question, it is limited by several weaknesses in experimental rigor, data interpretation, and presentation. The mechanistic findings are not entirely novel, since the role of PI3K-AKT-mTOR signaling in therapeutic resistance is already well-established in the literature. Rather than uncovering new resistance mechanisms, the study largely confirms known pathways. Several key conclusions are not supported by the data, and critical alternative explanations - such as additional mutations or increased KRAS expression - are not thoroughly investigated or ruled out. Furthermore, while the authors use CRISPR-Cas9 to knock out PI3K and 4E-BP1 in H23-AR and H358-AR cells to restore sotorasib sensitivity, they do not perform reconstitution experiments to confirm that re-expressing PI3K or 4E-BP1 reverses the sensitization. This prevents full characterization of PI3K and p-4EBP1 upregulation as contributors to resistance. The manuscript also has several errors, poor figure quality, and a lack of proper quantification. Additional experimental validation, data improvement, and text revisions are required.

      Acquired resistance to KRAS<sup>G12C</sup> inhibitors such as sotorasib or adagrasib remains a significant clinical challenge. Therefore, the identification of mechanisms of acquired resistance, along with the development of alternative therapeutic strategies, including combination therapies with KRAS inhibitors, represents an urgent unmet clinical need. The emergence of secondary KRAS mutations or new mutations in other oncogenic drivers has been observed as a primary cause of acquired resistance in a fraction of patients. No identifiable mutations were detected in more than half of the tumors from patients who developed acquired resistance after treatment with sotorasib or adagrasib.

      Using a discovery-based approach that integrated global proteomic and phosphoproteomic analyses in the TC303AR and TC314AR PDX models, we identified distinct protein signatures associated with KRAS reactivation, upregulation of mTORC1 signaling, and activation of the PI3K/AKT/mTOR pathway. These findings prompted further investigation into these mechanisms of resistance and evaluation of novel therapeutic combinations to overcome resistance. Notably, the combination of sotorasib with copanlisib (a PI3K inhibitor), or the combination of sotorasib with AZD8055, or sapanisertib (mTORC1/2 dual inhibitors) demonstrated strong potential for future clinical use. These regimens effectively restored sotorasib sensitivity in both in vitro and in vivo models and produced robust, synergistic antitumor effects across various acquired resistance models.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      Whole exome sequencing was performed on resistant cells or PDX models to confirm retention of the KRAS<sup>G12C</sup> mutation and to assess for potential secondary KRAS mutations. While our study focused on KRAS secondary mutation and its specific signaling pathways, we acknowledge that additional resistance mechanisms may be involved. These will be the focus of future investigations.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors focus on the identification of the mechanisms involved in the acquired resistance to Sotorasib in non-small lung KRASG12C mutant cells. To perform this study, the authors generate different clones of cell lines, cell-derived xenografts, patient-derived xenograft organoids, and patient-derived xenografts. In all these models, the authors generate resistant forms (i.e., resistant cell lines PDXs and organoids) and the genetic and molecular changes were characterised using whole-exome sequencing, proteomics, and phospho-proteomics. This analysis led to the identification of an important role of the PI3K/AKT/mTORC1/2 signalling network in the acquisition of resistance in several of the models tested. Molecular characterisation identified changes in the expression of some of the proteins in this network as key changes for the acquisition of resistance, and in particular, the authors show that changes in 4E-BP1 are common to some of the cells downstream of PI3K. Using pharmacological testing, they show that different drugs targeting PI3K, AKT, and MTORC1/2 sensitise some of the resistant models to Sotorasib. The analyses showed that the PI3K inhibitor copanlisib has an effect in NSCLC cells that, in some cases, seems to be synergistic with Sotorasib. Based on the work performed, the authors conclude that the PI3K/mTORC1/2 mediated 4E-BP1 phosphorylation is one of the mechanisms associated with the acquisition of resistance to Sotorasib and that targeting this signalling module could result in effective treatments for NSCLC patients.

      The work as presented in the current manuscript is very interesting, provides cell models that benefit the community, and can be used to expand our knowledge of the mechanism of resistance to KRAS targeting therapies. Overall, the techniques and methodology seem to be performed in agreement with standard practice, and the results support most of the conclusions made by the authors. However, there are some points that, if addressed, would increase the value and relevance of the findings and further extend the impact of this work. Some of the recommendations for changes relate to the way things are explained and presented, which need some work. Other changes might require the performance of additional experiments or reanalysis of the existing data.

      Strengths:

      (1) One of the stronger contributions of this article is the different models used to study the acquisition of resistance to Sotorasib. The resistant cell lines, PDXs and PDXOs, and the fact that the authors have different clones for each, made this collection especially relevant, as they seem to show different mechanisms that the cells used to become resistant to Sotorasib. Although logically, the authors focus on one of these mechanisms, the differential responses of the different clones and models to the treatments used in this work show that some of the clones used additional mechanisms of resistance that can be explored in other studies. Importantly, as they use in vitro and in vivo models, the results also consider the tumour microenvironment and other factors in the response to the treatments.

      (2) Another strength is the molecular characterisation of the different Sotorasib-resistant tumour cells by WES, which shows that these cells do not seem to acquire secondary mutations.

      (3) The use of MS-based proteomics also identifies proteome signatures that are associated with the acquisition of resistance, including PI3K/mTORC1/2. The combination of proteomics and phospho-proteomics results should allow the identification of several mechanisms that are deregulated in Sotorasib-resistant cells.

      (4) The results show a strong response of the NSCLC cells and PDXs to copanlisib, a drug for which there is limited information in this cancer type.

      (5) The way they develop the PDX-resistant and the PDXO seems to be appropriate.

      Weaknesses:

      In general, the data is of good quality, but due to the sheer amount of data included and the way it is presented and discussed, several of the claims or conclusions are not clear.

      (1) The abstract is rather long and gives details that are not usually included in one. This makes it very complicated to identify the most relevant findings of the work. The use of acronyms PDX, PDXO, and CDX without defining them makes it complicated for the non-specialist to know what the models are. Rewriting and reorganisation of the abstract would benefit the manuscript.

      We revised the abstract to ensure that the key findings and overall message are clearly communicated and easily understood by readers.

      (2) Expression, presentation, and grammar should be reviewed in all sections of the manuscript.

      This has been done in the revised version

      (3) In the different parts of the result section where the models shown in Figure 2 are described the authors indicate "Whole-exome sequencing (WES) confirmed that XXX model retained the KRASG12C mutation with no additional KRAS mutations detected" however, it is not indicated where this data is shown and in not all the cases there is explanation to other possible modifications that might relate to mechanisms of resistance. This information should be included in the manuscript, and the WES made publicly available.

      WES was done for KRAS to investigate the additional secondary mutation in the KRAS as well as to verify the retention of the KRAS<sup>G12C</sup> mutation in these AR models. WES data has been provided as supplements

      (4) The way the proteomics analysis of the TC303 and TC314 parental and resistant PDX is described in the text is confusing. The addition of an experimental layout figure would facilitate the understanding. As it is written, it is not obvious that the parental PDX were also analysed. For instance, the authors say, "The global and phosphoproteomic analyses identified over 8,000 and 4,000 gene protein products (GPPs), respectively". Is this comparing only resistant cells, or from the comparison of the parental and resistant pairs? And where are these numbers presented in the figures? Also, there is information that seems more adequate for the materials and methods sections, i.e., "Samples were analyzed using label-free nanoscale liquid chromatography coupled with tandem mass spectrometry (nanoLC-MS/MS) on a Thermo Fusion Mass Spectrometer. The resulting data were processed and quantified using the Proteome Discoverer 2.5 interface with the Mascot search engine, referencing the NCBI RefSeq protein database (Saltzman, Ruprecht). Two-component analysis is better named principal component analysis."

      The text has been revised accordingly

      (5) While the presentation of the proteomics data could be done in different ways, the way the data is presented in Figure 3 does not allow the reader to get an idea of many of the findings from this experiment. Although it is indicated that a table with the data will be made available, this should be central to the way the data is presented and explained. A table (ie, Excel doc) where the raw data and all the analysis are presented should be included and referenced. Additionally, heat maps for the whole proteomes identified should be included. In the text, it is said, "Global proteomic heatmap analysis revealed unique protein profiles in TC303AR and TC314AR PDXs compared to their sensitive counterparts (Figure 3C)." However, this figure only shows the histogram of the differentially regulated cells. Inclusion of the histogram showing all the cells is necessary, and it might be informative to include the histogram comparing the two isogenic pairs, which could identify common mechanisms and differences between both sets. In Figure 3C, the protein names should be readable, or a reference to tables where the proteins are listed should be included.

      The raw data associated with the proteomics and global proteomics can has beeen added as supplements.

      (6) In Figure 3, the pathway enrichment tool and GO used should be mentioned in the text. The tables with all significant tables should also be provided. The proteomics data seems to convincingly identify mTOR as one of the pathways deregulated in resistant cells, but there is little explanation of what is considered a significant FDR value and if there are other pathways or networks that are also modified, which might not be common to both isogenic models. In MS-based Phosphoproteome could help with the identification of differentially regulated pathways, but it is not really presented in the current manuscript. Most of the analysis of phospho-proteomics comes from the RPPA analysis, which is targeted proteomics. With the way the data is presented, the authors show evidence for a role of mTOR in the acquisition of resistance, but unfortunately, they do not discuss or allow the reader to explore if other pathways might also contribute to this change.

      The authors agree that other pathways may be involved, and this will be the subject of future study. The raw data has been added as supplements for the readers' interest.

      (7) Where is the proteomics data going to be deposited, and will it be made public to comply with FAIR principles?

      Has been uploaded according to the journal guidelines

      (8) The authors claim that the resistance shown for H23AR and H353AR cells is due to reactivation of KRAS signalling. This is done by looking to phosphorylation of ERK as a surrogate, as they claim, "KRAS inhibition is commonly assessed by evaluating the inhibition of ERK phosphorylation (p-ERK)". While this might be true in many cases, the data presented does not demonstrate that the increase in p-ERK is due to reactivation of KRAS. To make this claim, the authors should measure activation of KRAS (and possibly H- and NRAS) using GST-pull down or an image-based method.

      We agree that KRAS activation can be assessed through various methods. In this manuscript, which primarily focuses on mechanisms of resistance, pathway analysis revealed upregulation of KRAS signaling. This finding correlated with the incomplete inhibition of p-ERK by sotorasib in resistant cells. Notably, p-ERK status is widely recognized and routinely used as a surrogate marker for KRAS pathway activation.

      (9) The experiments in Figure 4 are very confusing, and some controls are missing. There is no blot where they show the effect of Sotorasib treatment in H23 and H358 parental cells. Is the increase shown in resistant cells shown in parental or is it exclusive for resistant cells only (and therefore acquired)? Experiment 4B should include this control. What is clear is that there is an increase in the expression of AKT and PI3K.

      H23 and H358 cells are highly sensitive to sotorasib, as demonstrated by the cell viability assays presented in Figure 2. As shown in Figure 3—figure supplement 3, sotorasib treatment led to complete inhibition of p-ERK in these parental cell lines. In contrast, p-ERK inhibition was incomplete in the resistant H23AR and H358AR cells, highlighting a distinct signaling behavior that prompted us to further investigate on AR cells. Moreover, these AR cells were continuously cultured under sotorasib pressure to maintain the resistance.

      (10) The main point here is whether this is acquired resistance or the sensitivity to the drug is already there, and there was no need to do an omics experiment to find this. In some cases, it seems that the single treatment with PI3K inhibitors is as effective as Sotorasib treatment, promoting the death of the parental cells. This is in line with previous data in H23 and H353 that show sensitivity to PI3K inhibition (i.e., H358 10.1016/j.jtcvs.2005.06.051; 10.1016/j.jtcvs.2005.06.051H23 10.20892/j.issn.2095-3941.2018.0361). The data is clear, especially for copanlisib, but would it be the case that this treatment could be used for the treatment of NSCLC alone or directly in combination with Sotorasib and prevent resistance? The results shown in Figure 4C strongly support that a single treatment might be effective in cases that do not respond to Sotorasib. The data in figure 4D-F (please correct typo "inhibition" in labels) seem to support that PI3K treatment of parental cells is as effective as in the resistant cells.

      We agree. Based on our in vitro (Figure 4) and in vivo (Figure 7) data, copanlisib was able to overcome sotorasib resistance, demonstrating either synergistic or additive effects depending on the specific model. These findings support the potential of combining PI3K inhibition with KRAS<sup>G12C</sup> inhibition as a promising strategy to address acquired resistance.

      (11) The experiments presented in Figure 7 show synergy between Sotorasib and copanlisib treatment in some of the resistant cells. But in Figure 7G, the single treatment of H23AR is as effective as the combination. Did the authors check the effect of this drug on the parental cells? As they do not include this control, it is not possible to know if this is acquired sensitivity to PI3K inhibition or if the parental cells were already sensitive (as indicated by the Figure 4 results).

      Both H23 and H23AR cells demonstrated high sensitivity to copanlisib, as shown in Figure 4. Combination index analysis for the copanlisib + sotorasib treatment (Figure 7A) revealed synergistic effects on cell viability at specific concentrations. However, in the in vivo experiment (Figure 7G), we did not observe a clear synergistic effect of the combination treatment against H23AR xenografts. This may be attributed to the dose of copanlisib used, which was potentially sufficient on its own to produce a strong antitumor response, thereby masking any additional benefit from the combination.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      To strengthen the scientific rigor and overall presentation of the study, the authors should consider the following:

      (1) Perform additional functional validations, including reconstitution experiments after PI3K and 4E-BP1 knockouts, to more definitively demonstrate the role of these targets in mediating resistance.

      CRISPR-Cas9-mediated PI3K and 4E-BP1 knockout clones were generated in more than one resistant cell line that expressed a robust level of the knockout target, and multiple independent clones in each cell line were evaluated with and without gene disruption. Acquired resistant H23AR and H358AR isogeneic cells overly expressed PI3K and 4EBP1 proteins, whereas the expression of these proteins was normal in parental cell lines (H23 and H358). These two pairs of cell lines (H23 vs H23AR & H358 vs H358AR), along with multiple knock-out clones from each cell line, were used in every functional assay, which represents the cells or clones with normal, overexpression, and no expression of the target proteins (Figure 5B, D-F & Figure 6D-E). Given the thorough nature of this analysis, additional reconstitution experiments were deemed unnecessary, as they would not yield further insight.

      (2) Improve experimental quantifications, particularly for western blot analyses, and ensure all key findings are supported by statistically significant comparisons.

      The changes observed on the Western blot were not subtle and obvious without quantification.

      (3) Clarify enrichment analysis by directly comparing resistant and sensitive models and use appropriate FDR thresholds (<0.05) when claiming significant pathway activation.

      The Mass Spectrometry data were analyzed by the Department of Biostatistics, and the methodology for the statistical analysis is explained in the Methods section. The enriched pathways were identified by pre-ranked GSEA using the gene list ranked by log-transformed P values with signs set to positive/negative for a fold change of >1 or <1, respectively, from the global proteomics and phosphoproteomics data. All the enriched pathways were ranked based on their enrichment scores and considered significant with an FDR value <0.05. Each enrichment plots in Figure 2 were marked with its respective FDR q value as well as nominal p-value (Figure 2D-E). The result section (page 14) is also revised for clarification.

      (4) Address alternative mechanisms of resistance, such as secondary mutations or KRAS overexpression, through deeper genetic and proteomic profiling.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations, and not considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (5) Improve data presentation by enhancing figure quality, ensuring consistent labeling, and providing complete figure legends and descriptions.

      Revised

      (6) Revise and polish the manuscript text for clarity, accuracy, and consistency, paying special attention to avoiding contradictory statements and strengthening mechanistic interpretations.

      Revised

      Major Comments:

      (1) In Figure 1A, the authors state that "four PDX models were selected for evaluating sotorasib sensitivity based on their distinct co-mutation patterns," but it is unclear whether these patterns are common, clinically significant, or selected for another specific reason. Clarification is needed regarding the rationale for model selection.

      The models have co-mutations that are common in clinical specimens and are associated with drug resistance (Skoulidis, Ferdinandos, et al. "Co-occurring genomic alterations define major subsets of KRAS-mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities."Cancer discovery 5.8 (2015): 860-877). Out of 11 PDX models with KRAS<sup>G12C</sup> mutations, 4 models were selected for in vivo evaluation of sotorasib sensitivity based on their distinct co-mutation status. Co-mutations with either p53, STK11, or KEAP1 are the most commonly found co-mutations in NSCLC and become more challenging in therapeutic treatments in the clinic. All four PDXs selected for the in-vivo study harbor at least one of these co-mutations with the KRAS<sup>G12C</sup> mutation.

      (2) Whole-exome sequencing (WES) results for TC303 AR and TC314 AR are mentioned but not shown in the supplementary material. These results should be included.

      Included as a figure supplement in Figure 1-figure supplement 1

      (3) In Figure 2 - Figure Supplement 1, H23 AR and H358 AR acquired multiple SNPs and indels compared to their sensitive counterparts. The authors need to address whether these genetic alterations could contribute to resistance.

      The authors agree that other pathways may be involved, and this will be the subject of future research. Our WES analysis on H23AR and H358AR cells, shown in Figure 2 Supplement 1, did not find any additional mutations in KRAS, although there were some SNPs and Indel mutations considered as outside the scope of our current study. KRAS signaling upregulation found in Gene Enrichment Analysis, shown in Figure 3D, was validated through its ERK-phosphorylation status in Figure 3-supplement 3.

      (4) In Figure 3D-E, in the enrichment analysis, the authors describe enrichment of mTORC1 signaling in resistant PDXs without sufficiently comparing with the sensitive counterparts. They need to clarify whether the enrichment is unique to resistant cells.

      The comparison is sensitive to resistant cells (Figure 3C). In Figure 3D-E all enrichment data presented in the figure were derived from global and phosphoproteomic analysis on sotorasib-acquired resistant TC314AR PDX and compared with its sensitive counterpart TC314 PDX (Figure 3D) and sotorasib-acquired resistant TC314AR+TC303AR PDXs (combined) vs their sensitive counterparts TC314 + TC303 PDXs (Combined) in Figure 3E. We revised the text to make it clear.

      (5) In Figure 3F, the FDR values of 0.5 and 1.0 are too high to support conclusions of significant pathway activation. Similar issues exist for Figure 3 - Figure Supplement 2 (FDR q-values of 1.0, 0.989, and 0.813).

      Agree, FDR values are higher in the enrichment analysis on phosphoproteomic data, and not in the proteomics data. However, these enrichment scores indicate pathway activation. The FDR was higher, most likely due to the low number of phosphoproteins enriched in the designated pathways. Significant FDR values were found when the enrichment analysis was done on global proteomics data.

      (6) In Figure 3H, PI3K upregulation is inferred from RPPA quantification. An independent validation, such as immunoblotting, should be provided.

      In addition to the sotorasib-acquired resistant PDX samples, PI3K was found to be upregulated and shown in immunoblotting on sotorasib-resistant isogeneic cell lines (H23AR and H358AR cells) in Figure 4B.

      (7) In Figure 4B, increased PI3K (p85) levels alone do not support pathway activation, as p-AKT levels remain unchanged. Functional downstream markers (e.g., p-S6, p-4EBP1) should be assessed.

      Agree, the status of other downstream markers, such as p-S6 and p-4EBP1, was shown in Figure 4H and Figure 5E & 5F.

      (8) In Figure 4D, PI3K inhibition does not reduce colony formation in AR cells relative to parental cells. The data do not support the conclusion that PI3K inhibition sensitizes AR cells.

      These experiments show that the drugs are equally effective in the presence or absence of drug resistance to sotorasib. The specific role of PI3K is shown in the knockout experiments (Fig. 5) as explained in the result section on pages 18-19. H23AR and H358AR cells showed over 600- and 200-fold resistance to sotorasib as compared with their sensitive counterpart (Figure 2A) with IC50 20µM and 6µM, respectively. Whereas copanlisib, a PI3K inhibitor, can significantly sensitize the AR cells with the IC50 0.39µM and 0.06µM in H23AR and H358AR cells, respectively, which were as sensitive as the parental cells. PI3K signaling was significantly upregulated in AR cells, and inhibition of the PI3K-AKT-mTOR signaling through CRISPR-Cas9 PI3K knock-out (Figure 5) or inhibition of PI3K or downstream molecules by copanlisib, everolimus, or AZD8055 sensitizes the AR cells as singularly or synergistically with sotorasib (Figure 6H, & Figure 7A).

      (9) In Figures 4D-F, single or combination inhibition of PI3K, AKT, and mTORC1 in H23/H23AR and H358/H358AR cells shows no significant difference in colony formation between resistant and parental lines. Therefore, the conclusion that PI3K inhibition sensitizes sotorasib-resistant cells is not supported by the data.

      See response to (8).

      (10) In Figure 4G, copanlisib does not significantly inhibit p-mTOR (S2448) in H23 AR cells, and total mTOR levels decrease slightly. Quantification should be added.

      Added as a supplement

      (11) In Figure 4G, western blot results for p-PDK and PDK are not quantified, and effects vary between H23^AR and H358^AR cells. Quantification needs to be added.

      Added as a supplement

      (12) In Figure 6H, cell viability curves for H23AR/PI3K KO 3-3 cells start from <60%, suggesting pre-existing poor cell health. This casts doubt on conclusions regarding dual drug effects.

      All cell viability remained at or close to 100% at the no-treatment control condition, and the cell viability at the starting point was lower than 100% only in the combination treatment group, where the cells were treated with at least one drug. Here, a fixed dose of AZD8055 (50nM or 100nM) was combined with different doses of sotorasib. The dual drug effects are assessed by the combination index, which takes viability factors into account. Combination effects were confirmed by in vivo experiments.

      (13) The manuscript claims that mTORC1 inhibition alone is insufficient to suppress resistance (page 23), yet earlier reports that the mTORC1 inhibitor everolimus significantly reduces colony formation (page 17). This inconsistency needs to be addressed.

      revised. On p. 23, we are referring to 4E-BP1-mediated resistance.

      (14) In Figure 7G, since copanlisib alone appears as effective as combination therapy, the authors should revise the conclusion to emphasize the sufficiency of PI3K inhibition alone.

      Agree, the copanlisib treatment appeared to be very effective in the H23AR xenograft model, which is most likely due to the copanlisib dose used in this model, which showed a strong antitumor effect and superseded the combination effect. However, the synergistic antitumor activity of copanlisib with sotorasib was found in H358CDX and TC314AR PDX models (Figure 7D, & I).

      (15) In Figure 7I, statistical comparisons (P-value) comparing combination therapy to copanlisib monotherapy are missing. Without statistical significance, the conclusion regarding the combination efficacy cannot be justified.

      Revised

      Minor Comments:

      (1) Figure 1D is not described in the main text.

      Revised

      (2) On page 12, "FigG" and "FigH" should be corrected to "Figure 2G" and "Figure 2H," respectively.

      Revised

      (3) On page 17, the section title "copanlisib modulates PI3K-AKT-mTOR signaling..." should capitalize the first word.

      Revised

      (4) In Figure 7, "sotorasib" and "AMG510" are used interchangeably but refer to the same drug; consistent labeling should be used to avoid confusion.

      Revised

      (5) In Figure 7 - Figure Supplement 2A-B, the rationale for switching from AZD8055 to sapanisertib, another dual mTORC1/mTORC2 inhibitor, is unclear and should be explained.

      Revised

      Reviewer #2 (Recommendations for the authors):

      Please review all the figures and labels, are there are many mistakes? Also, check the way that the figures are presented and, if necessary, increase the definition.

      Revised

      (1) Figure 2 seems to be squashed.

      Revised

      (2) RPPA experiment "PI3K-AKT-mTOR signaling pathway compared to their sensitive counterparts. Specifically, the expression levels of MEK1, p-MEK1, p-MAPK, PDK1, p-PRAS40, p-GSK-3β, p-4E-BP1, p-PI3K, p-Akt, p-PRAS40, p-p38-MAPK, p-AMPK, and p-MAPK were markedly increased in resistant TC303AR and TC314AR PDXs." Several of these proteins are not really part of the PI3K-AKT-MTOR pathway, as such, but the MAPK pathway, and this is masked by not mentioning this. It is also necessary to explain which proteins are called MAPK and why there are 2 p-MAPK.

      Revised

      (3) Figure 3 - Figure Supplement 3. The images seem saturated for some of the blots. Is there still a decrease in ERK activity in the resistant cells? Lower exposure blots should be included, and if possible, some quantification performed.

      Quantification added

      (4) Figure 4I, review the title of the left graph, as this is not only sensitivity to everolimus.

      Revised

      (5) The figure legends need extensive review and rewriting. For instance, in Figure 6, the times for how long the treatments were performed in the different graphs have to be specified. The figure legends must allow interpretation of the data without reading the material and methods or text.

      Revised

      Materials and Methods

      This section needs special attention for typos and style, for instance:

      (1) Correct "KRASG12G inhibitors including sotorasib, adagrasib," to G12C.

      Revised

      (2) Use appropriate symbols i.e., "3 ul sgRNA (30 uM), 0.5 ul Cas9 (20 uM), and 3.5 ul Buffer R were mixed"

      Revised

    1. eLife assessment

      This study provides an important and biologically plausible account of how human perceptual judgments of heading direction are influenced by a specific pattern of motion in optic flow fields known as retinal curl. By combining psychophysical experiments and neural modeling, the authors demonstrate that what was previously considered an incidental "nuisance" signal actually serves as a functional control signal for estimating heading and steering toward a fixated target. While the evidence for the role of curl signals is convincing and advances our understanding of vision-based navigation, the work's impact would be strengthened by situating these findings among other cues that contribute to heading estimation, and by clarifying both the time course of these computations and their generalizability across different navigational contexts.

    2. Reviewer #1 (Public review):

      Summary:

      This carefully executed study uncovers the functional relevance of curl signals that impinge on the retina every time an observer's gaze direction and movement direction are not aligned.

      Strengths:

      This finding is important, highlighting the functional role of an abundant incidental signal (curl in retinal motion) that has thus far believed to be a nuisance that needs to be filtered out of the retinal motion stream.

      The study's evidence is compelling: a combination of psychophysical experiments and critical manipulations, control theory and neural modeling, which together make an internally consistent and biologically plausible case for the role of curl signals in estimating heading direction.

      This study uncovers the functional relevance of curl signals that occur on the retina when an observer is moving, and gaze is not straight ahead. The experimental and modeling results clearly go beyond previous studies and significantly advance our understanding of vision-based navigation.

      Another clear strength is that the study uses tightly controlled experimental manipulation to provide strong test cases for the hypothesis that curl is used for visual navigation. These conditions are important to constrain the proposed model (and future models) of heading control.

      The modeling is very clearly described, and the modeling and analysis code is published and freely available. The authors go beyond a back-of-the-envelope control model and show how it might be implemented at the neural-circuit level. The model is biologically plausible.

      Weaknesses:

      The discussion would benefit from an extension of the implications of the study and predictions of their model.

    3. Reviewer #2 (Public review):

      This study examines how curl in the retinal flow field can be used as a control variable for estimating and controlling the heading of a moving observer. The basic idea (which is not entirely new, see Matthis et al. 2022) is that translation along a path with eccentric gaze (meaning that the subject is not heading toward the point they are looking at) produces a pattern of optic flow on the retina with a rotational component around the point of fixation (which can be captured by the mathematical "curl" operator). The sign and magnitude of retinal curl vary with heading relative to the point of fixation, such that curl can be used as a control variable to steer rightward or leftward to move toward the fixated target. The authors perform behavioral experiments and show that there are biases in perceived heading that seem to be largely governed by retinal curl. They also show that a simple controller model can use curl to steer toward a target, and they provide a neural network model that provides a biologically plausible implementation of the controller (although there are some questions about that).

      There is a core of interesting work here that I think can be important to the field. However, there is a lack of clarity on several important fronts, including design of the behavioral experiments, presentation of the behavioral data, conceptual framing of what curl can and cannot do, etc. Equally importantly, the manuscript is not written in a manner that will make it accessible to most vision scientists. I consider myself to be pretty knowledgeable about optic flow, and I had to read most of the manuscript 3 or 4 times to be able to understand the bulk of it. And my experience is that most vision scientists do not understand optic flow well, so I fear that most of the readers that the authors should want to reach would struggle to understand the work. As written, this is mainly going to make an impact on a handful of optic flow gurus. Thus, I consider that this manuscript will need a major overhaul to clarify important issues and make it more accessible.

      Major issues:

      (1) The manuscript contains inconsistent, if not misleading, messaging about what information retinal curl does, and does not, provide regarding heading estimation. In the Abstract, the authors state: "We propose an alternative: the visual system utilizes retinal curl directly to estimate heading, rendering the explicit recovery of the FOE unnecessary." Based on my understanding of the rest of the manuscript, I find this statement to be a misrepresentation for two main reasons:

      a) To "directly estimate heading" relative to what? When not qualified, most people interpret "heading" to mean an observer's heading relative to the world (or some allocentric reference frame). But retinal curl only gives information about an observer's heading relative to the point on which their eyes are fixated. Moreover, that point of fixation will change every few hundred milliseconds in natural viewing, so the retinal curl will change with each new fixation even as heading relative to the world remains unchanged. So I think most readers would grossly misinterpret the claim that retinal curl can be used "directly to estimate heading". Indeed, in the authors' controller model, the initial heading needs to be given, and then the controller can work. But from where does the visual system get the initial heading, since it does not come from curl? These issues are left hanging. Thus, while curl can provide a very useful input for steering toward a fixated target, other signals are needed to estimate heading relative to the world. This has to be made much clearer early on, and a conceptual schematic diagram might help. Also, the authors generally do not specify the reference frame of the variables they are talking about, leaving lots of room for misinterpretations. It should be clear each time they are talking about a variable, such as heading, whether it is relative to the fixation target, body, world, etc.

      b) It seems to me that retinal curl will depend on other variables, in addition to heading relative to the fixation target. For example, it seems to me that the magnitude of retinal curl will depend on self-motion speed, the depth structure of the scene, the angle of elevation of the fixated target, and perhaps others. This is not discussed at all, and many readers would get the misguided impression that there is a 1:1 mapping from curl to heading (relative to fixation). If I am right that this is not correct, it means that retinal curl can tell the observer whether to steer right or left to move toward the fixated target, but it cannot tell them how much to steer. Indeed, in the authors' controller model, there is a free parameter that calibrates curl to angle. It makes sense that this works to fit trajectory data that are given from a fixed environment, but it is unclear how the brain would use retinal curl to control steering when these other variables are uncertain or changing unpredictably. Moreover, how does the system change the mapping from curl to steering command as the location of fixation changes relative to the current heading? These are issues that need to be brought up in framing the problem and discussed at some length. If the authors can show mathematically that retinal curl is only dependent on heading (relative to fixation) and not any of these other variables, it would be very valuable to show the equations for this relationship.

      (2) The description of the behavioral experiment and presentation of behavioral data leaves a lot to be desired.

      a) First, it is stated (line 158) that "Participants continuously reported their perceived direction of self-motion while maintaining fixation on the yellow dot." Again, the reference frame is completely unspecified. Participants were reporting their perceived heading relative to what? The fixation target? The world? What exactly were the instructions given to the subjects to perform the task? Based on the description of how perceived paths are computed (line 166-), it seems to be presumed that subjects are reporting their heading relative to the world because those angles are then converted into x and z coordinates in what I presume is a world-centered reference frame. But how do we know that subjects are accurately reporting their heading relative to the world? What if they are biased in their reports by the location of the fixation target relative to the scene, or by some other reference signal? Is it possible for the authors to rule out the possibility that perceptual biases seen in the unaltered curl condition result from observers not fully adopting the assumed reference frame of the task? If this cannot be firmly excluded, it seems to create problems for the rest of the study.

      b) I also feel that there is a mismatch between what the behavioral task requires and what the controller model does. Subjects are apparently asked to report their heading relative to the world, but the controller model only controls their heading relative to the point that they are fixating. I understand how this is resolved in the model, but I think this type of distinction is buried and will not be apparent to most readers. Again, the reference frames of what is being measured and controlled need to be specified explicitly in all parts of the paper, and the authors need to explain how the system would combine curl-based control with some other measures of (at least initial) heading for world-centered heading to be computed. All of the assumptions need to be clearly specified.

      c) In addition, I found it frustrating that the authors never present raw perceptual data from the observers. Rather, in Figure 2, we see reconstructed trajectories that are perfectly smooth with no indications of noise whatsoever. Since these paths are computed from the perceptual reports, there must be some noise inherent in them. The figures should represent this uncertainty somehow, and it should be explained how these perfectly smooth trajectories are obtained.

      (3) "...the magnitude of retinal curl in the fovea can specify the body trajectory relative to gaze (Matthis et al., 2022)." The main idea put forward by the authors here seems to overlap heavily with this statement that they attribute to Matthis et al. 2022. While I think this paper still adds importantly to the topic, the authors do not discuss how their findings are different from those of Matthis et al. 2022, why they are an important extension, etc. Readers should not have to go read this other paper to have any idea how the present findings are placed in importance relative to the literature.

      (4) The analysis and treatment of eye movements is extremely weak. The authors discarded trials for which gaze deviated from the fixation point by more than 3 degrees (which is a LOT given that the eye speeds are generally in the neighborhood of 0.5 deg/sec), and they provide basic stats on the distribution of positions. But this largely misses the point: it is not small position errors that are likely to matter, but rather velocity errors. Even a small amount of retinal slip of the target while it is being pursued will cause image motion that is going to alter the optic flow field around the fixation target. So, for example, the retinal curl field may no longer be centered on the fixation target. How do we know that some of the perceptual biases are not influenced by image motion resulting from imperfect tracking of the fixation target? This needs to be analyzed and discussed.

      (5) I found the sections of text comparing the separate and joined fits (starting line 287) to be a bit too rosy. The authors show the separate fits in the main text, and it is not very surprising that these fits are good, given that the model has 30 parameters, and these data are pretty low-dimensional. The authors only show the joined fits in the supplement, and they say that they are almost as good as the separate fits (indeed, they are better in a model comparison sense, but this is 30 parameters vs. 2 parameters). However, when I look at the fits of the joined model in the supplement, I don't find them to be very impressive. In particular, the model grossly misses the data for the straight paths for several subjects (e.g., id5, id6, id8, id10). And fitting the straight paths would presumably be easiest. This implies that the joined model is really missing something and that fitting the curved paths interacts strongly with fitting the data for different fixation target locations on the straight path. I think that the authors should discuss the results a bit more soberly and tone down their conclusions here.

      (6) The section of the paper on neural simulations (starting line 387) has a few weaknesses. First, why are only straight paths simulated here? This does not seem to provide a very rigorous test of the model. Second, it is awkward that the simulation results are presented in units of pixels, rather than degrees. Third, the authors seem to downplay the fact that the neural estimates of heading seem to oscillate rather wildly (over a range of hundreds of pixels, whatever that means, see especially Figure S16). It was far from clear to me how an estimate of heading with these large oscillations is useful. It would seem to require that heading estimates are integrated over substantial lengths of time to be reliable. It was therefore unclear how the model produces such smooth paths from these oscillating estimates.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript uses a novel paradigm to demonstrate that rotational motion patterns in the retinal image, called curl, directly influence perception of heading direction. This means that it is not necessary to recover the focus of expansion, defined by the point of zero motion when moving along a straight trajectory toward a target, as is commonly thought.

      Strengths:

      It has long been accepted that the focus of expansion of the optic flow field generated by self-motion is used to guide heading direction. While there have been many challenges to the need to recover the focus of expansion when gaze is not in the direction of travel, it is still not well understood how retinal motion patterns contribute to heading perception. Recent work has demonstrated the complexity of the retinal motion patterns during natural walking, where body motion adds a rotational component. A rotational component also results from curved paths as well as gaze off the direction of travel. This rotational component is called curl. The primary contribution of this manuscript is to demonstrate convincingly that curl influences perception of heading, and that it is not necessary to recover the focus of expansion.

      A strength of the manuscript is that realistic retinal motion patterns are generated by recording the image sequences generated by a walker in a virtual environment, and then using those patterns as stimuli in the experiment. This allows the creation of the more complex flow patterns that are a consequence of the bob and sway of natural walking, which are often considered a minor factor. The elegant experimental design allows direct manipulation of the curl signal, and this in turn directly influences measured heading perception. Another strength is that the authors ground their findings in control theory and neural computations, using a model that produces human-like path trajectories.

      The study is timely, given the long history of this question, together with the growing understanding of the complexity of naturally generated retinal motion and the absence of direct evidence for the way that these motion patterns are used in heading perception. It adds an important piece of evidence for how retina-centered optic flow may be used by the visual system, which is critical for our understanding of motion processing in the brain.

      Weaknesses:

      The primary limitation of the paper is that it avoids discussion of some of the inevitable complexities of heading perception. The main issue is what exactly is meant by heading. Different behaviors evolve over different timescales. The geometry of retinal motion defines instantaneous heading, which varies widely through the gait cycle. Time-varying information like this is known to be important in the momentary control of balance. Heading can also be thought of as steering the body toward a distant goal, which evolves over longer timescales. The current manuscript appears to be concerned with heading information integrated over a few seconds and seems to provide evidence that heading is indeed integrated over the gait cycle. The issue of the time scale of the computation is touched on, but it is not related to how it might be used in normal walking or what situations it might apply to. Steering toward a distant goal during walking is not a very difficult problem and may not require evaluation of retinal motion, but control of balance is more challenging and may depend critically on curl. Consequently, the timescale of the computation needs to be considered in order to understand what is meant by heading.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate Reviewer #1’s very positive feedback. Incorporating the perspective of ‘incidental’ sensory signals is a valuable suggestion that aligns perfectly with our findings. We agree that this perspective significantly strengthens the impact of our paper.

      In the revised version, we will update the manuscript to bridge these perspectives (the functional role of incidental” sensory signals and the role of retinal flow in navigation). In addition we will elaborate on the potential predictions of the model and possible manipulations that might affect the integration between sensory evidence (curl signal) and straight-ahead prior.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s feedback regarding the formalization of our reference frames. We agree that certain definitions were implicitly assumed rather than explicitly stated. We will revise the manuscript to provide all necessary self-contained information, ensuring that the geometry of the task response and the definition of heading are unambiguous. Also, we will address the gap between the task response (in world coordinates) and the functional role of the controller, as well as the other points raised by the reviewer.

      Major issues:

      (1a), (2a) Clarification of Reference Frames

      The reviewer asks: “To ‘directly estimate heading’ relative to what?”

      In our study, participants were instructed to report their “perceived direction of self-motion” by aligning a rotational encoder (steering wheel) with the direction they felt they were moving within the 3D simulated scene. Consequently, participants reported their instantaneous heading in a world-centered reference frame, from which the 3D trajectories were reconstructed. Since the reviewer had to infer this information, it should be clarified to ensure it is immediately evident.

      Participants were informed that the initial heading (i.e. θ<sub>0</sub> in our controller nomenclature) was oriented “straight ahead” relative to their body which was aligned longitudinally with the experimental room. We will modify Figure 1B and revise the Methods section to explicitly clarify this initial alignment and the instructions provided to participants.

      In the revised manuscript, we will clarify that while the participant’s report is world-centered, the retinal curl provides a gaze-relative heading signal. Although this was already mentioned, we will emphasize this point. In natural navigation toward a fixated target, a world-centered vector is often unnecessary; an error signal indicating heading relative to fixation is sufficient (as the reviewer also notes). However, the initial alignment of the heading within the 3D scene allows the brain to “calibrate” this internal controller, mapping the retinal curl signal onto the 3D world coordinates required for the task.

      The reviewer also asks how we can be certain that participants were reporting in world coordinates rather than an alternative frame, such as “heading relative to the fixation target.” We believe our “Cancelled Curl” (and over-cancelled) conditions provide the most compelling evidence to rule out this alternative. In these conditions, the physical position of the fixation target in the scene remained identical to the unaltered flow condition. If participants were simply reporting heading relative to the fixation target’s spatial location, the observed biases should have persisted regardless of the flow manipulation. Instead, the bias vanished when the curl was removed. This causal evidence proves that the bias is driven by the retinal motion signal (curl) rather than the spatial orientation of the eyes or the target’s position in the scene. Furthermore, the temporal evolution of the response supports a world-centered integration. For simulated straight paths, the perceived heading remains straight for the first few seconds (consistent with the initial world-centered alignment), with biases only emerging after approximately 3 seconds of integration (a point we elaborate on in our response to Reviewer #3). Had participants been responding based on a simple gaze-relative reference frame from the onset, these biases would have manifested significantly earlier. We will incorporate these points into the revised Discussion to better frame our findings alongside other cues, such as the Focus of Expansion (FOE), that contribute to heading estimation.

      (1b) The reviewer notes that we must be clear about the relationship between curl and heading (relative to fixation) and the variables that affect curl.

      Beyond the discrepancy between heading (θ) and gaze (ψ), curl is geometrically determined by translational self-motion speed (υ), eye height (h), and pitch (α). More specifically curl = (υ sin_ψ_cos α)/h). The derivation will be included in the Supplementary Information. Since h = d_sin_α, where d is the 3D distance to the fixation point, we could express cos α as a function of distance. Certainly, there is not a 1:1 map from curl signal to heading relative to gaze (e.g. θ – ψ). Participant would need to know υ and eye height plus extra-retinal information. Frenz et al (2003, Vis Res.) showed that people can estimate self-motion directly from optic flow, across different simulated eye height and gaze angle; extra-retinal information can, in addition, provide knowledge to (ψ) and (α). It is then plausible that the visual system can use and transform the curl signal from a qualitative directional cue (i.e. steering left or right of fixation) into a quantitative steering command. By combining curl with knowledge of gaze orientation and eye height, the visual system can resolve ambiguities in the flow field and utilize curl as a more precise error signal for locomotor control. These aspects will be included in the new version.

      (2b) Mismatch between task and controller

      We thank the reviewer for this point. We have addressed the alignment of the reference frames in our response to Issues 1a and 2a. Once the initial orientation () is established in the world frame, the controller model generates steering adjustments that directly translate into heading predictions within that same world reference frame. By treating the perceptual report as an output of the locomotor controller, we resolve the discrepancy between the steering task and the reported heading.

      (2c) No raw data provided

      We respectfully disagree with the reviewer’s interpretation regarding data smoothing. The thin lines in Figure 2 represent the mean 3D paths derived directly from the response variable (θ<sub>0</sub>) across trials of identical conditions for each participant (as detailed in the ‘Computation of Perceived Path’ section). No smoothing or filtering has been applied to these plotted trajectories other than computing the mean across trials. We also wish to remind the reviewer that the raw data and analysis code remain publicly accessible for further inspection. Regarding the visual representation: in earlier versions of the manuscript, we included shaded 95% Confidence Intervals (CIs) in Figure 2. However, this addition rendered the plot overly cluttered and obscured the individual trajectories. We therefore elected to present individual participant means (thin lines) alongside group averages (thick lines) to emphasize inter-subject variability. For clarity, the 95% CIs are explicitly displayed in Figure 3, where the data density is more conducive to shaded areas.

      (3) Difference with Matthis et al (2022)

      While Matthis et al. (2022) described the existence of retinal curl during walking and which information can provide relative to gaze, Our paper provides the causal link, since we manipulate in real-time (the ‘cancelled & overcancelled curl’ condition) providing the critical evidence that perceived heading is affected by this signal.

      (4) Eye movements analysis

      We thank the reviewer for noting that retinal slip (velocity error) is a more critical metric than positional gaze error. We agree that tracking inaccuracies can introduce translational noise into the flow field. The 3° threshold was established based on the eye tracker’s specifications and the naturalistic setup (1-meter viewing distance without head stabilization). Across all participants, the mean positional error ranged from 1.016° to 1.5° (1 deg is 2.08 cm in our setup). We also calculated retinal slip values, which ranged from 0.12 to 0.27 deg/s (X dimension) and 0.12 to 0.23 deg/s (Y dimension). These values are comparable to natural oculomotor drift (Kowler et al., 1979) and are understandably small given the low velocity of the fixation target. Consequently, it is highly unlikely that retinal slip influenced the results. Furthermore, assuming that tracking error remained consistent across fixation conditions, any present retinal slip cannot explain why the bias followed the retinal curl manipulation as predicted by the controller. We therefore consider retinal slip to be an unlikely confounding factor.

      (5) the separate and joined fits

      We thank the reviewer for the opportunity to clarify the logic behind our modeling choices. We acknowledge that the “separate fits” are inherently less informative due to the high number of free parameters relative to the data. Our primary scientific goal was not to achieve perfect descriptive accuracy via 30 parameters, but to test a specific functional hypothesis through the “joint fit.”

      The Logic of the Joint Fit:

      We agree with the reviewer that the joint fit misses some paths in some conditions. Of course, the joint fit reflects a significant compromise. The “Gain” (the weighting of the curl signal) is likely not a static constant but is dynamically tuned based on task demands, confidence in the visual signal, simulated speed, and so on. By using a single Gain parameter, we intentionally ignore this contextual variability to see how much of the behavior can be explained by a “minimalist” controller. In this sense, the 2-parameter joint model is a deliberate attempt to test this limit. By forcing a single Gain parameter to account for all conditions across both straight and curved paths within one flow manipulation (e.g. unaltered flow) we are asking if a single, fixed linear relationship between retinal curl and steering effort/gain can explain the results. We view the joint fit not as a “perfect” model, but as a stronger test of the curl-based control theory. The fact that a 2-parameter model can capture the direction and scale of biases across such a diverse set of conditions (straight/curved paths, five fixation eccentricities) suggests that retinal curl is a robust signal. Upon closer analysis, these discrepancies between the joint model and the data are most pronounced in the over-cancelled condition which is the one when sensory evidence becomes more ecologically inconsistent with the extra-retinal information (gaze direction). While the joint fit successfully demonstrates that a single parameter can capture the general functional role of curl, it fails to account for the complex sensory re-weighting that occurs in ecologically inconsistent conditions (like ‘over-cancelled’ flow). We will update the manuscript to discuss these limitations, framing the model as a parsimonious first-order approximation rather than a complete description of human heading perception based on a minimal set of parameters.

      (6) On the neural simulations

      We acknowledge that the presentation of the neural model requires more clarity regarding its objectives and its relationship to the behavioral data.

      We first wish to clarify the intended scope of the neural ring-attractor model. Our primary goal was not to provide a comprehensive account of behavioral performance across all conditions (which is the role of the controller model), but rather to demonstrate a biologically plausible mechanism that explains the emergence of the “Opposite-to-Gaze” bias. While the controller demonstrates that the bias follows a specific control law, the neural model shows how such a law can emerge from known primate neurophysiology, specifically, spiral-tuned MSTd neurons, gaze-contingent inhibition, and an egocentric “straight-ahead” prior.

      Why Straight Paths are Sufficient for this Objective. The reviewer asks why only straight paths were simulated. In our study, the straight-path condition with eccentric gaze is the purest test of the bias mechanism. Simulating the straight paths allowed us to isolate the interaction between foveal inhibition and the straight-ahead prior without the confounding variable of path-curvature flow. Given the complexity of the neural network’s parameter space, we focused on these conditions to provide a clear neuro-plausible explanation.

      Units: Pixels vs. Degrees. We acknowledge that the use of “pixels” in the plots of internal neural dynamics may appear awkward. The neural network operates on input stimuli that are defined by the pixel resolution of the videos used in the simulations, we used pixels as the native coordinate system to describe the movement of activity peaks within the network’s internal “map.”

      Behavioral Output (Meters): Importantly, the final heading estimates produced by the network are not left in pixels. We use a pinhole camera model to reconstruct the 3D trajectories from the neural activity. These results are expressed in meters, allowing for a direct comparison with the human behavioral data.

      Addressing Wild Oscillations and Smooth Paths. The oscillations observed in the instantaneous heading estimates reflect the stochastic nature of the population peak when tracking high-frequency sensory inputs. In our model, the synaptic time constant (τ) was kept relatively small to ensure a fast, low-latency response to changes in self-motion. While increasing τ would have produced smoother internal dynamics, it would also have introduced delays into the control loop. Instead, we chose to maintain this high sensory responsiveness and applied a temporal moving average later to the network’s decoding to reconstruct the 3D trajectories.

      In addition, the neural activity over time is shown in two ways: the heatmap shows the neuron with preferred heading (one can see more oscillations, specially when the fixation point is closer to the centre (eccentricities -2 and 2), due to larger competition between the sensory evidence and the straight-ahead prior. The other way is the decoded heading. In the ring-attractor model, the decoded heading is not determined by a single neuron but is calculated using a population vector average (equation 19). By summing across the entire population, the decoder effectively integrates sensory evidence from many neurons simultaneously. One can appreciate (see e.g. Fig. 5B) that averaged decoding, leads to a smoother resulting estimate (the white dashed line, whose visibility will be improved in the revised version). Behavioral work by Burr and Santoro (2001) suggests that global motion signals (divergence and rotation in optic flow) are integrated over much longer timescales—roughly 1000ms to 3000ms—compared to local motion units (~200ms).

      See also our comment on temporal integration in the responses to reviewer #3.

      Reviewer #3 (Public review):

      We thank Reviewer #3 the comments regarding the definition of heading at different time scales, the role of the gait cycle, and the temporal integration of the curl signal. They will help us refine the manuscript’s core arguments.

      We agree that “heading” must be precisely defined within the context of the differing temporal demands of balance and steering. While instantaneous retinal motion provides the high-frequency feedback necessary for momentary postural adjustments and balance, our study is concerned with heading as a gaze-relative signal used for the continuous control of a locomotor trajectory. As such, we will revise the manuscript to specify that the perceived heading measured in our task reflects a signal integrated over the gait cycle to filter out the oscillatory noise induced by head bob and sway.

      The reviewer correctly notes that gait-induced head bob and sway produce high-frequency oscillations in the curl signal, yet our behavioral results show smooth, slowly evolving biases. The visual system does not react to “instantaneous” curl, which would lead to jittery, unstable heading estimates. Instead, it integrates flow over a timescale roughly commensurate with a full gait cycle (~500–1000ms). This implies a significant temporal integration process. This temporal integration is consistent with evidence (Burr and Santoro,2001, Vis Res) indicating that optic flow signals (radial and rotational components) are integrated over windows of approximately up to 3 seconds to ensure perceptual stability. Neurally, this likely involves the projection from area MSTd to the Ventral Intraparietal area (VIP), a pathway where fast, eye-centered sensory inputs are transformed into stable, body-centered representations suitable for guiding long-term steering behavior (Chen et al. 2011, JNeurosci.). By grounding our definition of heading in these specific temporal and neural constraints, we aim to clarify how the visual system exploits retinal curl for goal-directed action in natural, dynamic environments and relate our findings to recent studies addressing the role of retinal motion on balance (Powell et al. 2026 Bioarx).

      In our implementation, we explicitly address the high-frequency noise introduced by gait dynamics by smoothing the retinal curl signals computed from the stimulus videos before they are fed into the controller. This temporal filtering allows the fit of the controller’s prediction to the response data while remaining robust to the rapid fluctuations of head bob and sway. In contrast, the neural ring-attractor model would not require an external smoothing step; instead, the integration is an emergent property of the system’s architecture that can be controlled with different parameters. The dynamics of the synaptic weights and the characteristic “leak” in the population activity naturally implement a leaky integration of sensory evidence, ensuring that the decoded heading reflects a sustained estimate rather than an instantaneous response to visual noise.

    1. Author response:

      Reviewer 1:

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      We thank the reviewer for thoroughly evaluating our work.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      We will clarify that when discussing bimodal stimuli, we mean the co-presentation of two unimodal stimuli of similar intensity. We will add references to the literature during discrimination tasks that have shown that multisensory cue-combination followed Bayes rule integration (e.g., Ernst & Banks, 2002; Battaglia et al., 2003; Alais & Burr, 2004) and clarify in which ways our work differs from this rich body of work and provides novel contributions.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion. Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity.

      One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      We will clarify that our framework explains how absence judgments (and related confidence) are formed, and what it adds to SDT models, including the reproduction of reaction times and a normative explanation of criterion placement (results about RTs are available in the supplementary materials).We will also run additional model comparisons assessing how an SDT-based generative model performs compared to our Bayesian model based on counterfactual perceivability.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      This is an excellent suggestion, and we will conduct the proposed analysis.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      All typical measures of metacognitive sensitivity are influenced by metacognitive bias and task performance to some extent, and none of them is a pure measure of type-2 sensitivity (e.g., see Rahnev, 2025). Here, we chose logistic regression because it enables modeling interactions with other predictors in a factorial design with a limited number of trials.

      We will clarify the limitations of metacognitive sensitivity measures and better explain why we then used Mratio to estimate metacognitive performance while controlling for underlying task performance.

      Thank you for this suggestion. We will add the d’ values per condition to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      This higher confidence for absence judgments than for presence judgments was observed while controlling for response accuracy. We will clarify this in the main text.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding. Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive/absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model. Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      Indeed, the disjunctive rule was expected, given our design; we will clarify this. As mentioned above, we will directly compare the results of our current model with those of a more traditional SDT-based generative model, as suggested by the reviewer.

      Contrary to a classical drift diffusion model, the model does not assume a fixed decision boundary, but derives an optimal stopping policy per time point and belief state. As a result, and depending on beliefs about perceptual evidence and the temporal discounting factor, optimal decision boundaries can be asymmetric and may collapse asymmetrically toward 0. Furthermore, given the asymmetry in the information value between sensor activations and inactivations, and differences in the information value of sensor activations of the two modalities, boundary crossing can lead to belief states that are far or close to the decision boundary, depending on the nature of the evidence. Together, even without an explicit modeling of post-decisional evidence, the model can account for variability in the total accumulated evidence at decision time.

      From our understanding, the proposed alternative is equivalent to our current model, in which confidence scales with evidence magnitude.

      The model was not fitted to confidence data, which could explain its overall overconfidence. To further test our model, we will assess its ability to reproduce patterns of metacognitive efficiency (M-ratios).

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

      The model was not fitted to either the modality-specific responses or the confidence ratings, so the correlation between observed and simulated CAI was not expected and provides a good test of our model's ability to reproduce the observed patterns. We will test whether the same correlations hold when using the difference in accuracy instead of the confidence.

      We found that the best model is the one with the same belief across the visual and auditory sensors. Given this, we cannot investigate how modality-specific belief parameters are linked to unimodal sensitivity for each participant.

      Reviewer 2:

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1 . In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      This is an excellent suggestion. We will run this analysis and refit the d’ and criterion response using unequal-variance models to see whether we observe the same results.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task

      We will clarify this point in the manuscript. In Experiment 2, the stimulus was absent on 50% of the trials. As a result, the 50% of stimulus present trials were split into the three possible conditions, resulting in a sixth of the trials being auditory, a sixth visual, and a sixth audiovisual; we will make these proportions clearer in the text.

      We did not have any a priori hypotheses about the response criteria for Experiment 1. The reviewer is right, the proportion of absent versus present trials can indeed have an impact on response bias. In fact, one of the goals of Experiment 2 was to test whether the low frequency of absent trials compared to present ones could explain both response bias and higher confidence in absence observed in Experiment 1, which we found was not the case, as we did not observe a difference between the two experiments. We will clarify this in our revision.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      We will add individual data to Figure 2.

      Indeed, staircases failed to achieve 50% detection on average; participants for whom psychometric curves did not converge were excluded, as were those at floor level in one of the two modalities.

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123and https://link.springer.com/article/10.3758/s13428-025-02852-7 ). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      Data were collected on the Prolific platform, which has been shown to yield high-quality data (Kay, 2025). However, we agree that this is a potential concern and will add a note of caution in the revised manuscript, even if the risk that the data do not come from humans but from bots is low (Huskey et al., 2026; Chetverikov, 2026).

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      We will clarify this aspect of the analysis. Data were binned into four quartiles based on the overall distribution of confidence values across participants, based on the binning used in the example in Fleming (2017). We will examine whether changing the number of bins changes the results (Dayan, 2023).

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      Yes, logistic regression has been used to quantify metacognitive sensitivity before. We will add the relevant papers as references (e.g., Sandberg et al., 2010; Norman et al., 2011; Siedlecka et al., 2016; Wierzchoń et al., 2012; Faivre et al., 2018; Pereira et al., 2023)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      The number of participants was determined using a Bayesian optional stopping rule, as preregistered. The reviewer is right that the delta values are very similar in the two experiments. Given that a difference was found in only one experiment, we decided not to draw conclusions from it.

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      We will examine how the proportion of miscategorisation changed throughout the task.

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      Our main goal was to investigate how participants form integrated, supramodal confidence judgments on the basis of multisensory sources of information. Therefore, the amodal confidence judgments are required here.

      Moreover, the model was fitted to response times that corresponded to the amodal judgment. Because we had no meaningful response times for the modality-specific judgment, we could not use them to fit the model.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      Indeed, the model does not capture the full behavioral effects reflecting multisensory interference in the modality-specific responses. We suppose that the model does not reproduce these interferences, as it is only fitted to amodal detection accuracy, and as the two sensors are completely independent from one another. We will clarify this aspect in the text.

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this

      In addition to the d’ and criterion comparison between the observed and simulated data, we will compare modality-specific d’ and the correlations between observed and simulated confidence.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      To further support this claim, we will add a quantitative comparison of the different confidence fits.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      Thank you for this remark. Indeed, the difference in metacognitive efficiency may be driven by differences in the d’ values, and so a lower d’ for auditory stimuli can lead to higher metacognitive efficiency for a similar metacognitive sensitivity.

      Reviewer 3:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      We thank the reviewer for engaging with our work.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      We will improve the narrative structure so that each result clearly relates to the research question.

      We will also add a clearer definition of the various metacognition metrics to improve readability.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      The model was not fitted to confidence data, which could explain its overall overconfidence. As stated in previous responses, we will perform additional analyses to evaluate the model’s ability to reproduce confidence ratings. As some of the results were not replicated across experiments, we decided to put all statistical results related to multisensory interference in the supplementary materials and to focus only on consistent results across experiments.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

      The relation here is indeed artificial; we try to reinforce it as much as possible in the instructions of the task by indicating to the participants that they have to “detect a mosquito” that could be present auditory, visually, or both. But we acknowledge that the association between the visual and auditory stimuli is artificial, which may indeed influence our results.

      References

      Alais, D., & Burr, D. (2004). The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. Current Biology, 14(3), 257‑ 262. https://doi.org/10.1016/j.cub.2004.01.029

      Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. JOSA A, 20(7), 1391‑ 1397. https://doi.org/10.1364/JOSAA.20.001391

      Chetverikov, A. (2026). Online behavioral studies are safe for now : Unusual RTs do not imply bots (A reply to Van der Stigchel et al., 2026) (Gjw5u_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/gjw5u_v1/

      Dayan P. (2023). Metacognitive Information Theory. Open mind : discoveries in cognitive science, 7, 392–411. https://doi.org/10.1162/opmi_a_00091

      Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), Article 6870. https://doi.org/10.1038/415429a

      Faivre, N., Filevich, E., Solovey, G., Kühn, S., & Blanke, O. (2018). Behavioral, Modeling, and Electrophysiological Evidence for Supramodality in Human Metacognition. Journal of Neuroscience, 38(2), 263‑ 277. https://doi.org/10.1523/JNEUROSCI.0322-17.2017

      Fleming, S. M. (2017). HMeta-d : Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1),

      Huskey, R., Zhao, Z., Parry, D. A., & Fisher, J. T. (2026). An AI agent can complete the Attention Network Test with human-like behavioral signatures : Implications for the bot-or-not debate (T2jru_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/t2jru_v1/

      Kay, C.S. Why you shouldn’t trust data collected on MTurk. Behav Res 57, 340 (2025). https://doi.org/10.3758/s13428-025-02852-7nix007. https://doi.org/10.1093/nc/nix007

      Norman, E., Price, M. C., & Jones, E. (2011). Measuring strategic control in artificial grammar learning. Consciousness and Cognition, 20(4), 1920-1929. https://doi.org/10.1016/j.concog.2011.07.008

      Pereira, M., Skiba, R., Cojan, Y., Vuilleumier, P., & Bègue, I. (2023). Preserved Metacognition for Undetected Visuomotor Deviations. Journal of Neuroscience, 43(35), 6176‑ 6184. https://doi.org/10.1523/JNEUROSCI.0133-23.2023

      Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16(1), 701. https://doi.org/10.1038/s41467-025-56117-0

      Sandberg, K., Timmermans, B., Overgaard, M., & Cleeremans, A. (2010). Measuring consciousness : Is one measure better than the other? Consciousness and Cognition, 19(4), 1069‑ 1078. https://doi.org/10.1016/j.concog.2009.12.013

      Siedlecka, M., Paulewicz, B., & Wierzchoń, M. (2016). But I Was So Sure ! Metacognitive Judgments Are Less Accurate Given Prospectively than Retrospectively. Frontiers in Psychology, 0. https://doi.org/10.3389/fpsyg.2016.00218

      Wierzchoń, M., Asanowicz, D., Paulewicz, B., & Cleeremans, A. (2012). Subjective measures of consciousness in artificial grammar learning task. Consciousness and cognition, 21(3), 1141-1153. https://doi.org/10.1016/j.concog.2012.05.012

    2. eLife Assessment

      This valuable study investigates how multisensory signals influence detection decisions and confidence judgments in presence and absence tasks using pre-registered psychophysical experiments and computational modeling. Across two online samples, the authors argue that audiovisual stimuli improve detection performance but do not enhance metacognitive efficiency, and that confidence is higher for absence than presence judgments. The evidence is broadly solid, although aspects of the computational interpretation and model comparisons would benefit from additional clarification and testing against simpler alternatives.

    3. Reviewer #1 (Public review):

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion.

      Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity. One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding.

      Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive / absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model.


Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

    4. Reviewer #2 (Public review):

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1. In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123 and https://link.springer.com/article/10.3758/s13428-025-02852-7). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      (14) Lastly, I think it would be interesting to look at how instructions about modality-specific attention could modulate these findings, in terms of how unimodal (unimodal visual, unimodal auditory) or bimodal attention might modulate these effects. This is an idea for future work.

    5. Reviewer #3 (Public review):

      Summary:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

    1. eLife Assessment

      This paper provides a valuable observation that imiquimod, a compound often used to induce a psoriasis-like skin inflammation in mice, has a TLR7-independent effect acting through the unfolded protein response and binding to Gelsolin. However, the mechanism connecting Gelsolin to skin inflammation presented in this paper is incomplete and requires further investigation. These findings are of interest to the field of skin immunology.

    2. Reviewer #1 (Public review):

      Summary:

      The study is technically extensive and employs a wide range of experimental approaches, including in vivo analyses, cell-based assays, and transcriptomic data integration. The authors provide a detailed characterization of inflammatory and stress-related pathways activated following IMQ exposure in mouse skin. These datasets may be informative for researchers specifically interested in IMQ-induced dermatitis or in stress responses triggered by chemical skin irritants.

      Strengths:

      The study is technically extensive and employs a wide range of experimental approaches, including in vivo analyses, cell-based assays, and transcriptomic data integration. The authors provide a detailed characterization of inflammatory and stress-related pathways activated following IMQ exposure in mouse skin. These datasets may be informative for researchers specifically interested in IMQ-induced dermatitis or in stress responses triggered by chemical skin irritants.

      Weaknesses:

      A major limitation of the manuscript is its exclusive reliance on the IMQ model, which does not adequately represent the immunological drivers, cellular interactions, or therapeutic responsiveness of human psoriasis, despite the manuscript's framing. IMQ-induced inflammation is dominated by innate immune activation and mouse-specific pathways, whereas human psoriasis is driven primarily by IL-23/IL-17-mediated interactions between keratinocytes and Th17/Tc17 cells. As a result, conclusions drawn entirely from IMQ-based experiments have limited relevance to human disease biology.

      Consistent with this issue, the manuscript places strong emphasis on pathways such as TLR signaling, inflammasome activation, and IL-1-associated responses, none of which are established as central drivers of plaque psoriasis in patients. Therapeutic strategies targeting these pathways have failed to achieve clinical efficacy comparable to IL-23 or IL-17 blockade, yet this translational gap is not adequately addressed.

      The in vitro keratinocyte experiments further limit interpretability. Stimulation of keratinocytes with IMQ is not an accepted model of psoriasis-relevant keratinocyte activation, and the study does not demonstrate induction of well-established psoriasis signature gene programs. Without this validation, it is difficult to assess the relevance of the observed cellular stress responses to human disease.

      The RNA-sequencing analyses raise additional concerns regarding rationale and interpretation. The basis for selecting specific mouse and human datasets is unclear, including the use of unpublished or non-psoriasis inflammatory datasets. Key methodological details related to data processing, normalization, cross-species comparison, and statistical analysis are insufficiently described. In addition, the limited number of differentially expressed genes identified does not align with the extensive psoriasis transcriptomic literature, raising concerns about analytical rigor.

      Finally, the manuscript emphasizes a small number of genes described as "psoriasis-associated" while failing to demonstrate regulation of widely accepted psoriasis signature genes known to correlate with disease activity and therapeutic response in patients.

    3. Reviewer #2 (Public review):

      Summary:

      This paper shows that imiquimod, a compound often used to induce a psoriasis-like skin inflammation in mice, has a TLR7-independent effects that induce the unfolded protein response and amplify cytokine expression in dendritic cells. Although these effects of imiquimod have been described in the literature before, this study provides more detailed evidence and different contexts to this observation. These findings add to existing literature that imiquimod has a pleotropic mechanism of action involving changes in mitochondrial functions and cellular stress responses. Specifically, the authors show that imiquimod can induce calcium signaling in immune cells and potentiate two branches of the unfolded protein response in a TLR7-independent and MyD88-independent manner. They also show that some of these effects might be partially mediated by direct binding of imiquimod to Gelsolin. These findings expand our understanding of imiquimod-mediated inflammation and are useful for the field of experimental skin immunology and mouse models of psoriasis. However, the molecular and cellular mechanisms connecting Gelsolin to the unfolded protein response and skin inflammation presented in this paper require further investigation in the context of TLR-mediated inflammation.

      Strengths:

      (1) TLR7-independent effects of imiquimod on the expression of genes and proteins involved in the unfolded protein response are well demonstrated.

      (2) Gelsolin is identified as a new imiquimod-binding protein in mouse cells.

      Weaknesses:

      (1) Effects of imiquimod on mitochondrial Ca signaling are not clear from the presented data.

      (2) The mechanism of action connecting imiquimod to Gelsolin on the unfolded protein response and cytokine production remains not fully explained.

      (3) It remains unclear if Gelsolin contributes to regulating TLR7 (or other types of TLR-mediated) inflammation in vivo.

    4. Author response:

      We sincerely thank the Reviewing Editor (Dr. Florent Ginhoux), Senior Editor (Dr. Satyajit Rath), and both reviewers for their thoughtful and constructive evaluation of our manuscript. We appreciate the recognition that our study provides a valuable observation regarding the TLR7-independent effects of imiquimod (IMQ) via the unfolded protein response (UPR) and Gelsolin in psoriasis-like dermatitis. Importantly, we acknowledge that the current framing may overemphasize direct relevance to human psoriasis. In the revised manuscript, we will reposition the study to focus on IMQ-induced skin inflammation as a model of chemical- and stress-induced inflammatory responses, rather than a direct representation of human plaque psoriasis. We also acknowledge that the mechanistic link between Gelsolin and skin inflammation remains incomplete, and we are committed to addressing the key concerns raised.

      Below, we outline our planned revisions in response to the public reviews. We will submit a revised version after performing the additional experiments and textual improvements.

      Reviewer #1 (Public review):

      We fully agree that the exclusive use of the IMQ model has limitations in fully recapitulating human plaque psoriasis, which is primarily driven by the IL-23/IL-17 axis involving Th17/Tc17 cells. We will substantially temper our claims regarding direct translational relevance to human psoriasis and clearly discuss the IMQ model as a tool to study innate immune-driven and chemical stress-induced inflammation in the skin (new Discussion section). In addition, we will strengthen the rationale for focusing on Gelsolin by incorporating available human data suggesting altered Gelsolin expression in inflammatory conditions.

      (1) We will add a dedicated paragraph in the Introduction and Discussion acknowledging the differences between IMQ-induced dermatitis and human psoriasis (citing key references such as PMID: 28945199).

      (2) For keratinocyte experiments, we will revise the text to avoid implying that keratinocytes stimulated with IMQ represent a psoriasis model, and instead position this system more conservatively. Specifically, we will treat keratinocytes as a system to assess AMP and chemokine induction rather than as a direct model of psoriasis. We will therefore incorporate stimulation with IL-17A (100 ng/ml) ± TNF-α (10 ng/ml) to establish AMP/chemokine induction, and additionally examine the effect of UPR activation by co-treatment with DTT (or other UPR inducers). This will allow us to determine whether UPR activation enhances IL-17A/TNF-α-driven AMP and chemokine expression.

      (3) We will expand the Methods section with full details on RNA-seq dataset selection, normalization, cross-species mapping, and statistical analysis, and re-evaluate key analyses where necessary to ensure robustness and reproducibility. Canonical psoriasis signature genes (e.g., S100A8/A9, IL-17C, IL-36g) will be validated by qRT-PCR in the revised manuscript.

      (4) Vehicle controls (including Aldara-specific effects) will be clearly described and shown in all relevant figures.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the strengths in demonstrating TLR7-independent UPR induction and Gelsolin as an IMQ-binding protein.

      (1) To strengthen the mitochondrial Ca<sup>2+</sup> signaling data (Fig. 1B), we will add an orthogonal approach (e.g., pharmacological inhibition or alternative Ca<sup>2+</sup> probe) in a new supplementary figure.

      (2) For Gelsolin-IMQ interaction specificity (Fig. 7E-G), we will perform additional experiments comparing IMQ versus RSQ (resiquimod) effects on the observed phenotypes, as recommended.

      We believe these revisions will substantially address the key concerns raised by the reviewers and strengthen the overall quality of the manuscript.

      We again thank the reviewers and editors for their time and valuable feedback, which will significantly improve the manuscript.

    1. eLife Assessment

      This important study advances a new computational approach to measure and visualize gene expression specificity across different tissues and cell types. The framework is potentially helpful for improving the way gene expression specificity is defined across biological datasets, especially among single-cell datasets. The evidence supporting the method is generally solid, although further evaluation of the method's robustness and comparison to other approaches would strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      Bot et al. introduce GeneSLand, a computational framework to quantify and visualize gene expression specificity across diverse transcriptomic datasets. The method leverages expression level-breadth (L-B) relationships to construct multi-level specificity landscapes and derives metrics such as lbSpec and dRate to characterize gene specificity in a threshold-independent manner. The authors showed the applicability of the approach across bulk RNA-seq, single-cell datasets, and cross-species primate brain data, showing that specificity patterns captured by this approach reflect both tissue-specific expression and evolutionary distances. Overall, the framework represents an interesting and potentially useful contribution to the analysis of gene expression specificity.

      Strengths:

      (1) Introduces an original conceptual framework based on expression level-breadth relationships to characterize gene specificity.

      (2) Provides a threshold-independent approach that could overcome some limitations of classical specificity metrics.

      (3) Demonstrates the applicability of the framework across different biological datasets.

      Weaknesses:

      (1) The method relies on predefined binning thresholds for expression levels, and the sensitivity of the derived metrics to this parameter is not fully explored.

      (2) The advantages of lbSpec relative to established metrics could be more clearly shown with some biological examples.

      (3) The robustness of the framework with noisy datasets, small sample sizes, or lower sequencing depth is not well evaluated.

    3. Reviewer #2 (Public review):

      Summary:

      Bot & Davila-Velderrain present a new method to understand expression specificity, based on an analysis of the relation between expression level and breadth for each gene. They show that the method captures biological differences across organs, diverse cell types, and specific cell subtypes, for different biological processes and across species.

      Strengths:

      This manuscript addresses an important question in an original manner, and was a pleasure to read. The authors frame the question very clearly: gene expression is a complex trait, which can be summarized in an informative manner by its specificity. The method the authors propose (which I'll call "LB" in this review) has several attractive features, summarising different specificity profiles in a more nuanced manner than the widely used tau. They show convincingly that their method captures relevant biology at different scales. I especially appreciated the comparative analyses of specificity within broad cell types and within neuronal subtypes.

      Weaknesses:

      Surprisingly, while the method works well, the authors never compare it to the state-of-the-art. Thus, comments 1 and 2 are my only "major" comments.

      (1) In the Introduction, the authors should explain which shortcomings of existing methods motivate the development of a new one.

      (2) In the Results section, the authors should compare the results of LB with other methods, at least tau and Gini (which is conceptually quite similar to LB).

      (3) It would be good to show the sensitivity of LB to different numbers of bins.

      (4) The conservation of specificity across primates was already reported in Kryuchkova-Mostacci 2016 (https://doi.org/10.1371/journal.pcbi.1005274). But also see Dunn et al 2018 (https://doi.org/10.1073/pnas.1707515115) for criticism of this type of naive pairwise comparisons.

    1. eLife Assessment

      This valuable study introduces an innovative experimental design to address a crucial and timely issue in microbial ecology: the potential bias in soil microbial community analyses caused by extracellular DNA degradation. While the evidence showing variable degradation rates of extracellular DNA is convincing, additional conceptual, methodological, and statistical clarifications could reinforce the claims and the study's contribution to the field. This research will appeal to microbial ecologists and researchers interested in using molecular techniques to evaluate microbial community structure.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the degradation dynamics of extracellular DNA in soils and its impact on estimates of microbial abundance and diversity. By combining a broad geographic sampling design with a primer-labeling strategy, qPCR quantification, amplicon sequencing, and PMA treatment, the authors aim to disentangle total versus intracellular DNA signals and explore sequence-specific degradation patterns. The topic is relevant, particularly given the increasing awareness of relic DNA as a confounding factor in microbial ecology. The experimental design is ambitious and potentially impactful. However, several conceptual inconsistencies, methodological ambiguities, and statistical limitations currently weaken the robustness of the conclusions. These issues need to be addressed.

      Strengths:

      The manuscript addresses a timely and important question in microbial ecology, particularly given the growing recognition that relic DNA can bias interpretations of community composition derived from amplicon sequencing. The study is ambitious in scope, incorporating a broad geographic sampling design across multiple soil types, which enhances the generalizability of the findings. The use of a controlled microcosm experiment combined with a primer-labeling strategy to track extracellular DNA dynamics is conceptually innovative and provides a structured framework to investigate degradation processes.

      In addition, the integration of multiple approaches, including qPCR for absolute quantification, high-throughput sequencing for community profiling, and PMA treatment to differentiate extracellular from intracellular DNA, represents a comprehensive attempt to disentangle complex sources of bias in soil microbiome analyses. The effort to link degradation dynamics with environmental variables and to explore sequence-level patterns further demonstrates the authors' intent to move beyond descriptive analyses toward a mechanistic understanding.

      Weaknesses:

      Several conceptual and methodological issues currently limit confidence in the study's conclusions. Key terms such as "sequence-specific degradation" are not clearly defined or supported by a mechanistic or structural hypothesis, making it difficult to interpret the biological meaning of the results. In addition, the bioinformatic workflow presents inconsistencies, particularly the use of ASVs followed by clustering at 97% similarity, which undermines the resolution required to support sequence-level inferences. Statistical analyses are also insufficiently described, including unclear definitions of "T values," a lack of detail on pairing structure, and no indication of multiple testing correction.

      Furthermore, important methodological details are missing or unclear, including primer design (e.g., GAPDH tag vs ACTF), Illumina library preparation (e.g., adapter and indexing strategy), and validation of PMA treatment efficiency. The interpretation of PMA-treated samples as representing "living communities" is likely overstated, given the known limitations of the method in soil systems. Finally, typographical errors, inconsistent terminology, and unclear phrasing throughout the manuscript reduce readability and further complicate interpretation.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes the results of an interesting study examining the rate of degradation of extracellular DNA in soil ecosystems using a clever experimental approach. 16S ribosomal RNA genes were amplified from soil samples, and then purified PCR amplicons, containing a 5' linker sequence on the forward primer, were introduced to soils and monitored over time using real-time quantitative PCR and NGS amplicon sequencing. The study was able to measure rates of overall extracellular DNA degradation, but also sequence-specific degradation rates. I like the idea and execution of the study, and the results are interesting. The manuscript needs some help to improve the overall readability. Please see general and editorial comments below.

      Strengths:

      Innovative experimental design that is well deployed across a large number of soil types, revealing interesting variability in extracellular DNA degradation.

      Weaknesses:

      (1) The manuscript needs another review to improve the readability of the document.

      (2) The authors have used 16S genes to look at sequence-specific degradation. But 16S rRNA genes are actually pretty well conserved, and there isn't as much genetic variation across this gene among organisms as there is for other genes. It might be more relevant to look at metagenomic DNA degradation from high AT, high GC organisms, etc. This would be more generalizable than 16S genes.

      (3) Consideration of differential cell lysis during soil DNA extraction needs to be considered as well.

      (4) It is not clear why the authors didn't put GAPDH linkers on the reverse primer as well. This would have given an easier amplicon to amplify (no degeneracies at all).

    1. eLife Assessment

      In this study, the authors use microCT to image an intact hatchling octopus and segment major organ systems, including the vascular, respiratory, digestive, and nervous systems. The resulting dataset is of good quality, and its release through a public web interface is a valuable resource for the community to explore cephalopod mesoscale anatomy. However, the authors claim to have elucidated previously uncharacterized chemotactile pathways from the suckers to the brain, for which there is incomplete evidence, as microCT does not reveal structural connectivity. In addition, the language is often overly complex, obscuring the main points and making it difficult to assess the strength of individual claims. This article would benefit from more cautious framing of the anatomical findings and complementary neuronal tracing experiments to support the proposed pathways.

    2. Reviewer #1 (Public review):

      Summary:

      Sugarman, Vanselow et al. adapted a microCT instrument to permit imaging of an entire organism, a hatchling octopus. In the resulting 3D dataset, they segmented the major organ systems, including the vascular, respiratory, digestive, and nervous systems. The authors released the dataset through a public web interface, and present some observations about body-wide neuroanatomy.

      Strengths:

      - The dataset is of good quality and access to a whole-cephalopod anatomical resource will be useful for the scientific community.

      - The interactive web interface facilitates exploration of the dataset.

      Weaknesses:

      - The authors identify a series of bundles of nerve fibers between the suckers and the central brain and propose that these structures together constitute the chemotactile pathway, linking sensation to learning and memory. This is an over-interpretation of the available evidence. The data is not presented in a way that allows the reader to independently assess the proposed anatomical relationships: many images include near-opaque colored overlays on the fibers of interest, making it difficult to determine whether these bundles truly merge or interface. Additionally, the mesoscale resolution achieved here reveals the presence of large nerve bundles but cannot resolve the origin or synaptic relationships of the neurons in the bundles - including those from the chemotactile receptors of the suckers. There are likely multiple synapses between the periphery and the central brain, and the location and connectivity of individual neurons are not visible at this resolution. Consequently, this dataset does not demonstrate structural connectivity. Elucidating a neural circuit would require complementary approaches such as neuronal tracing or electron microscopy connectomics.

      - The language used in the manuscript is often overly complex and convoluted, making it difficult to follow the main arguments and to assess the strength of the claims. In addition, some vocabulary in the main text is overly technical (e.g. related to microCT or anatomy), making it difficult for a general biology or cephalopod audience to understand, while some neuroscience vocabulary is used imprecisely or in ways that overstate what can be concluded from anatomical data. A substantial rewrite using clearer, more cautious language is recommended. Additionally, a deeper discussion of the observed octopus arm anatomy, and how this may relate to its semi-autonomous function would make this article of greater interest to a broader audience.

    3. Reviewer #2 (Public review):

      Sugarman et al show a major advance in the volumetric imaging of the cephalopod body and nervous system, using wide field high resolution micro-CT imaging. The new detection optics are striking in their performance, and the conclusions made from the images seem well-founded. The technical advance and the conclusions both justify the reader's attention, but the authors should make the figures and the text teach the reader so that the findings are more accessible and convincing.

      The paper is now written in a style that will impress those ready to be impressed and fail to impress many of the readers, although it should.

      (1) The authors must improve the text so that it cleanly states what was known previously, and how the current results extend this. For example. putting a section in the middle of the results section (page 3) that states: "Long-range connections between sucker and brain were demonstrated by fine chemical and tactile sensing by suckers in behavioral experiments with live O. bimaculoides (Buresch et al., 2022, 2024; Sepela et al., 2025; van Giesen et al., 2020; Wells, 1978a; Wells & Young, 1969) and by loss of chemotactile learning and memory observed after ablation of the "inferior frontal system" (i.e., inferior frontal/subfrontal/buccal lobe complex) (Wells, 1978a)..." seems a bit confusing to me. Similarly, putting in a reference to optical imaging approaches for combining data sets (Preibisch et al., 2009) as only the citation does little to make the work accessible. Please expand the text so that it teaches what the authors are thinking.

      (2) The authors must improve the figures so the work is more digestible. The data is a pyramid, and the "google earth" range of magnifications and details is not clear in the figures. In short, the figure will impress those who know to be impressed and fail to impress the majority.

      (3) The videos are far more useful in this contribution that in almost any other paper. Use them more so the reader realizes how key they are. Revising them to demonstrate the amazing range of scales in the data would be wise.

      (4) The demonstration of the data visualization tool is excellent as far as it goes. Expanding the treatment of the multi-scale rendering would be wise.

      With proper expansion of the text and the figures, it will become far more obvious that this is landmark work.

    4. Reviewer #3 (Public review):

      Summary:

      Sugarman et al. present a microCT scan of a hatchling octopus from the species Octopus bimaculoides. The scan is publicly available and poses as a valuable tool for the field of cephalopod biology. Using this scan, the authors uncover two undescribed neural pathways: the intermediate longitudinal tract (iLTs) in the axial nerve cord linking the suckers to the brain, and the arm-to-arm u-tracts (AAUTs) interconnecting neighboring arms. How the eight sucker-lined octopus arms are coordinated with one another and with the brain have been long standing questions in the octopus motor control field, and the results presented here have promise for addressing these questions. However, major weaknesses addressed below limit the interpretability of the dataset.

      Strengths:

      The authors have publicly published a scan of an entire hatchling octopus, with major organs and subdivisions of the nervous system already segmented. Accessing the data is straightforward, and the authors provide adequate instructions on how to navigate the dataset.

      The authors provide validation of the AAUTs using lucifer yellow and wheat germ agglutinin. To overcome motion artifact in the hatchling dataset, the connections between the iLTs and the suckers are validated with a microCT scan of a distal section of adult arm.

      Weaknesses:

      Given the resolution of the dataset, neural connectivity is determined by texture differences alone, which can be misleading. As such, any claims of anatomical connectivity will need further validation, ideally with tracing techniques. While the authors investigated the AAUTs with other techniques, no such validation exists for the iLTs. Furthermore, the authors themselves state that as the iLTs converge with the brachial nerve, they become indistinguishable from other fibers. Any connections of the iLTs to the brain are only hypothesized, despite their claim of demonstrating a clear pathway from the suckers to the brain.

      The relevant prior research on octopus neurobiology is not well explained, making it challenging to understand the significance of the results in a broader context.

    1. eLife Assessment

      This valuable study investigates the interaction of two integral membrane proteins (Cdhr1a and Pcdh15b) and their roles in cone-rod dystrophy. Convincing evidence using loss-of-function mutants demonstrates clearly that both proteins are required for cone maintenance and survival. Although some evidence (Western blots and cell aggregation assays) demonstrates Cdhr1a and Pcdh15b can physically interact, there is insufficient evidence to support the subcellular localization and the proposed heterodimeric interaction of the two proteins from distinct subcellular compartments in cone photoreceptors.

    2. Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading do this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al. makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1 associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Comments on the revised version of the manuscript:

      The authors adequately addressed previous comments related to lack of details on quantitative and statistical analyses and methods. In this regard, I believe the revised manuscript presents a stronger analysis of the data. I also appreciated the revised discussion section, which better contextualizes their new data with previous observations in different animal models.

      The authors provided additional evidence in Fig 1C-H for the co-localization of pcdh15b and actin within CPs using immunolabeling with super resolution imaging. This data firmly supports their other observations. A similar approach tends to also show co-localization of actin and cdhr1a, although the authors suggest that the pattern of expression is less overlapping, which would be expected if cdhr1a is predominately expressed in the OS membranes whereas pcdh15b is predominantly expressed in the CP membranes. In my opinion the data presented to support this separation is still not that convincing. Moreover, the authors show that both cdhr1a and pcdh15b are expressed in CPs using immuno-TEM (Fig 1I). This is a difficult question to address experimentally, and it is, of course, still plausible that pcdh15b within the CP membrane and cdhr1a within the OS membrane are interacting in trans. However, I just don't think that the data unequivocally support mutually exclusive localization of these proteins as suggested by the authors and depicted in the model in Fig 1J.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binging assay, and high- resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely opposes PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicate these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potential stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone specific phenotypes associate with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption is not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Fig 4F, 6E) as well as other morphometric data (Fig 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also not whether analysis was done in an automated and/or masked manner.

      Comments on revisions:

      Most of my concerns were addressed in this revised version.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss in less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty for this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data as presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      This is a large body of data.

      Weaknesses:

      (1) I have serious concerns about the quality of the imaging here. The premise that cdhr1a/pcdh15 juxtaposition is evidence for the two proteins mediating the connection between outer segments and calyceal processes requires very careful microscopy. The SIM images have two major issues - one being that the red and green channels are misaligned and the other being evidence of bleed through between the channels. This is obvious in Fig 2A but likely true across all the panels in Fig 2, and possibly applies to confocal images in Fig 1 as well. The co-labelling with actin shows very uneven, punctate staining for actin bundles.

      (2) The newly added TEM and transverse sections include colored regions that obscure the imaging.

      (3) The quantification should be done with averages from individual fish. Counting individual measurements as single data points artificially inflates the significance. Also, the cone subtypes are still lumped together for analysis despite their variable sizes.

      (4) I highlighted previously that the measurement of calyceal processes was incorrect. The redrawn labels in Fig 7 are now more accurate, although still difficult to interpret. However, the quantification in Fig 7O is exactly the same. How can that be if the measurement region is now different?

      (5) Lower magnification views would provide context for the TEM data.

      (6) The statement describing the separation between calyceal processes and the outer segment in the mutants is still not backed up by the data.

      (7) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs". This is now referenced, but incorrectly. Also, the issue of pigment interference was not addressed.

      (8) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. eLife Assessment

      This study provides important insights into how species-specific variation in oxytocin receptor regulatory architecture contributes to diversity in brain expression patterns and social behaviors. By generating multiple BAC transgenic mouse lines carrying the prairie vole oxytocin receptor locus and combining anatomical, molecular, behavioral, and chromatin-structure analyses, the authors present convincing evidence that distal regulatory elements constrain peripheral expression while permitting brain expression aligned with behavior. This study provides an experimental framework and a resource that are of value for dissecting how regulatory variation in neuromodulatory systems contributes to species differences in social behavior. This work will be of interest to those interested in social behavior, oxytocin, neuromodulation, and related conditions.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Tsukamoto et al. describes a compelling approach to understanding whether inter-species differences in social behavior might emerge from differential expression patterns of the oxytocin receptor (Oxtr) in the brain. To this end, they genetically engineer BAC transgenic mouse lines with insertions of a large construct incorporating prairie vole Oxtr gene and surrounding regulatory elements. They name these lines Koi lines. They first evaluate if prairie vole-like Oxtr expression is reproduced in the Koi mouse lines, and they find heterogenous patterns across different lines that do not depend on the number of insertions. While they found that Koi mice can reproduce vole-like expression in PFC, NAc, and BLA, the reproduction was never complete: one Koi line had NAc and mPFC expression, another had BLA expression, etc. They confirmed major expression patterns across 3 methods: crossing with LacZ reporter line, in situ hybridization, and ligand binding (autoradiography). To determine the expression pattern of the BAC insert but not endogenous Oxtr, the authors generated new mouse lines by crossing Koi lines with Oxtr -/- line. Importantly, they found that Oxtr expression pattern in the mammary gland was similar across all lines, and wild-type mice.

      The authors used Koi:Oxtr-/- lines to test social behavior, specifically partner preference ( a behavior specific to prairie voles) and maternal behavior. They find that different Koi lines showed different changes in these behaviors compared to wild-type mice. Moreover, while some lines showed changes in partner preference, others seemed to show changes in maternal behavior. For one of the lines (Koi4), the partner preference and the maternal behavior were incongruent.

      The manuscript then hypothesizes that the Oxtr gene is positioned in different 3D chromatin structures across species and across tissues, leading to more rigid expression in the mammary glands, but more flexible expression patterns in the brain.

      Strengths:

      This study has major implications in the field of oxytocin research, and more broadly in the field of neuromodulation. It is novel, bold, and rigorous.

      Weaknesses:

      (1) The expression in the brain and mammary gland (Figure 2) was not quantified, preventing a more objective conclusion that the brain has flexible expression and mammary gland expression is rigid.

      (2) In Figure 7, a similar heatmap for the mammary gland is missing.

      (3) Partner preference in males was not tested.

      (4) It is unclear if in the behavioral testing the stimulus animals were the same genotype as the focal female or were wild-types. This could have an impact on the behavioral outcome.

    3. Reviewer #2 (Public review):

      Summary:

      This is a bold and important study and addresses an important question in the field: how species-specific variation in brain oxytocin receptor expression relates to differences in social behavior.

      Tsukamoto et al. generated eight independent transgenic mouse lines (Koi lines) carrying a bacterial artificial chromosome (BAC) encompassing the prairie vole Oxtr locus along with flanking intergenic regions, with the goal of probing the behavioral consequences of species-specific variation in brain Oxtr expression. Across these "volized" lines, the authors claim conserved Oxtr expression in the mammary gland but strikingly divergent patterns of brain expression, none of which fully recapitulate endogenous prairie vole Oxtr distribution, and instead exhibit expression patterns that diverge from both mouse and prairie vole brain Oxtr distribution. Nevertheless, some lines exhibit partial overlap with vole Oxtr expression pattern reported in the literature within specific brain regions, and one line displays partner preference behavior reminiscent of prairie voles. The authors further report line-dependent differences in maternal pup retrieval and crouching behaviors, which they interpret as evidence that variation in brain Oxtr expression can drive variation in social behaviors. Together with analyses of topologically associating domain (TAD) architecture, the authors conclude that brain, but not peripheral- Oxtr expression, is shaped by distal regulatory elements beyond the BAC insert, and propose that such regulatory flexibility underlies evolutionary diversification of social behavior.

      Strengths:

      A particular strength of the study is the generation of multiple independent transgenic lines, which provides a valuable resource for probing regulatory influences on Oxtr expression.

      Weaknesses:

      While the study addresses an important question, I have several methodological and conceptual concerns regarding the study in its current form. Some aspects of the study fall outside my primary area of expertise, and I am therefore not in a position to fully evaluate the technical difficulty or rigor of those components, or to judge whether my suggestions would be feasible to implement. I defer to reviewers with relevant expertise for a more detailed assessment of these aspects.

      (1) Each independent Koi line exhibits a distinct brain expression pattern that differs from both wild-type mouse and prairie vole Oxtr expression, complicating the interpretation of the results. The manuscript does not include a direct comparison of brain Oxtr expression patterns in these transgenic lines with those of prairie voles. Instead, expression similarity is inferred primarily from regional localization and compared indirectly with prior literature (Figures 2-5). For those lines that show partial resemblance to prairie vole Oxtr expression patterns, the authors do not assess whether Oxtr-expressing neurons share comparable anatomical projections or transcriptomic identity with prairie vole Oxtr-expressing neurons. Quantification of expression remains largely descriptive, illustrating expression patterns (Figure 2), OXTR protein distribution (Figure 3; images are difficult to evaluate due to low contrast), or Oxtr mRNA levels across selected brain regions in Koi lines, wild-type mice, and mOxtr-/- mice (Figures 4-5), without directly testing similarity to prairie vole expression. In addition, whole-brain expression data are lacking, with analyses restricted to selected sections. While such analyses may be beyond the scope of the present study, these limitations nonetheless complicate interpretation of the central question - namely, whether the observed behavioral phenotypes arise from vole-like Oxtr circuits rather than from distinct, line-specific expression configurations.

      (2) The authors state that Oxtr expression in the mammary gland is similar across all Koi lines and the mOxtr-IRES-Cre knock-in line. However, the images presented in Figure 2 appear to show differences in anatomical detail across lines, and no quantitative analysis is provided to support the claim of equivalence.

      (3) The conclusion that integration site rather than copy number determines the observed BAC transgene expression patterns (Lines 202-203) is not fully supported by the data. First, the authors did not compare multiple copy numbers at the same genomic insertion site, making it impossible to disentangle copy-number effects from position effects. Second, BAC copy number does not necessarily scale linearly with expression; higher copy numbers can have a repressive effect on gene expression (Garrick et al, Nat Genet, 1998).

      (4) While I am not an expert in TAD analysis, the observed differences in 3D architecture around Oxtr are consistent with a role for long-range regulatory interactions. However, these analyses appear largely descriptive and correlative, and establishing a causal contribution of 3D chromatin organization to Oxtr regulation by distal elements would likely require direct perturbation of TAD boundaries or looping interactions. I recognize that such experiments may be beyond the scope of the present study, but clarifying this limitation in the interpretation would be helpful.

    4. Author response:

      Thank you very much for your careful evaluation of our manuscript entitled “Cross-Species BAC Transgenesis Reveals Long-Range Regulation Drives Variation in Brain Oxytocin Receptor Expression and Social Behaviors.” We sincerely appreciate the insightful and constructive comments from both reviewers.

      We are particularly encouraged by the positive assessment that our study provides a useful experimental framework and resource for understanding how regulatory variation contributes to diversity in brain expression patterns and social behaviors. We have carefully considered all comments and outline below the key revisions we will implement in the revised manuscript.

      Conceptual clarification: We will clarify the conceptual framework of the study. While our initial aim was to test whether prairie vole regulatory elements could recapitulate vole-like Oxtr expression patterns in mice, the generation of multiple independent Koi lines revealed that such expression is not faithfully reproduced but instead varies across lines. This observation led us to refocus the study on how regulatory architecture gives rise to diverse expression patterns and their functional consequences. Accordingly, we will revise the manuscript to emphasize that the goal is not to reconstruct prairie vole circuits, but to test how variation in Oxtr expression distribution drives variation in social behaviors.

      Quantification of expression patterns: We will include quantitative analyses of Oxtr expression in both brain and mammary gland tissues. These additions will provide an objective basis for comparing tissue-specific expression and support the conclusion that brain expression is more variable, whereas mammary gland expression is broadly conserved. We will include qRT-PCR data to support mammary gland comparisons.

      Behavioral interpretation: We will clarify that the behavioral analyses are designed to assess how distinct Oxtr expression patterns influence social behaviors within a controlled mouse system, rather than to directly replicate prairie vole phenotypes. We will refine the manuscript to clearly distinguish between partial resemblance to prairie vole expression and the broader goal of linking regulatory variation to behavioral diversity.

      Technical clarification and limitations: We will revise the manuscript to more carefully interpret the roles of genomic integration site and transgene copy number, noting that while integration site likely plays a major role, contributions from copy number cannot be excluded. In addition, we will explicitly acknowledge that our analyses of 3D chromatin architecture are correlative in nature, and that establishing causality would require direct perturbation of chromatin structure, which is beyond the scope of the current study.

      Presentation improvements: We will improve figure clarity, include representative reference images from prairie vole brain to facilitate qualitative comparison, and refine descriptions in the Results and Methods sections to enhance clarity and readability.

      We thank the reviewers again for their insightful and constructive feedback, which we believe will significantly strengthen the manuscript. We look forward to submitting a revised version incorporating these improvements.

    1. eLife Assessment

      This important study provides a comprehensive multi-omics characterization of Leishmania donovani stage differentiation, offering insights into the molecular basis of parasite adaptation across host environments. The authors present convincing evidence that stage transitions are not driven by genomic variation but instead rely on coordinated post-transcriptional regulation, including mRNA turnover, translation, and protein degradation. Although experimental validation of these findings and conclusions remains to be completed, the integration of diverse, high-quality datasets establishes a robust resource that will be of broad utility to researchers investigating Leishmania biology and life-cycle progression.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe co-regulated gene modules underlying stage differentiation in Leishmania donovani through a system-level analysis of multiple molecular layers. Using amastigotes isolated from infected hamster spleens and corresponding culture-derived promastigotes, they analyzed genomic variation, transcript abundance, protein levels, phosphorylation states, and metabolite profiles. By combining these, the study identified potential regulatory mechanisms associated with parasite differentiation and generated hypotheses regarding how gene expression is coordinated across different levels.

      Strengths:

      A major strength of the study is the breadth of the dataset generated. The integration provides an unusually comprehensive view of molecular changes associated with Leishmania differentiation in vitro. Such multi-layer datasets involving bona fide vertebrate host stages remain relatively rare in parasitology and will likely become a valuable resource for the molecular parasitology community. In addition, the use of amastigotes isolated from infected hamsters rather than relying on axenic models provided a biologically relevant framework for the analyses.

      The revised manuscript improved several aspects of the original. The RNA-seq analysis is described with a clearer pipeline, and several claims regarding causal regulatory feedback associations have been appropriately toned down. Among the observations reported, the association between parasite differentiation and proteasome-mediated protein degradation is particularly remarkable. The combination of quantitative proteomics with pharmacological inhibition of the proteasome with lactacystin provides support for a role for protein turnover in developmental transitions and paves the way for future mechanistic studies.

      Weaknesses:

      Most regulatory interpretations remain largely inferential or indirect. The integration identifies correlations between different levels, but direct functional validation is limited/absent. Many of the descriptions should not be interpreted as validated. As highlighted by the authors in this revised version, the mechanistic studies will be part of future work and are beyond the scope of the current work. Of note, the attempt to confirm lactacystin-induced inhibition of proteasomal activity via anti-polyUb immunoblotting did not demonstrate the expected outcome of increase in overall poly-ubiquitylation.

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

    3. Reviewer #2 (Public review):

      Pescher and colleagues present a revised manuscript detailing the multi-omic characterisation of Leishmania donovani amastigote to promastigote differentiation and integration of this data. The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses about the intersections of regulatory proteins that are associated with life-cycle progression. The differentiation step studied is from amastigote to promastigote using hamster-derived amastigotes which is a major strength. The use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy; the promastigote experiments are performed at a low passage number. Therefore, this is a strength or the work as it reduces the interference from the biological plasticity of Leishmania when it is cultured outside the host for prolonged periods. The multi-omics datasets presented are robust in their acquisition and analysis and will form an excellent resource for researchers studying the molecular events (particularly proteasomal protein degradation, and phosphorylation) during life-stage progression.

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.<br /> The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.<br /> The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.<br /> In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.<br /> The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.<br /> The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

    5. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. eLife Assessment

      In this important study, the authors demonstrate that generative AI techniques (restricted Boltzmann machine) can be used effectively to design and characterize mutational pathways of WW domains with different binding specificities. The computational studies are complemented by experimental validations, and the results provide solid evidence supporting the idea that sequence landscape holds significance in understanding protein evolution from a transition path perspective. The minor weakness of the study in the current form concerns limited success in designing variants with smoothly varying binding specificities. Nevertheless, the work will likely have a major impact on research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely non-functional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificity-switching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

    3. Reviewer #2 (Public review):

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 18-19. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBM-designed sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

    4. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. eLife Assessment

      This important study highlights the role of MIRO1 in regulating mitochondrial oxidative phosphorylation in smooth muscle cells, a process that appears necessary to sustain their proliferation. Overall, the work provides convincing evidence that mitochondrial positioning and function influence vascular disease, although several bioenergetic and mechanistic aspects would benefit from deeper investigation.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Comments on revisions:

      The authors have adequately addressed all the concerns raised by the reviewers, and the manuscript has been substantially improved

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Comments on revisions:

      The authors have addressed the concerns I previously raised.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      - This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.<br /> - This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.<br /> - The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      - Some key bioenergetic aspects may require further investigation.

      Comments on revisions:

      The authors have adequately addressed most of the concerns I raised. I would suggest adding some of the justifications provided to the reviewers to the manuscript to further clarify and aid interpretation of the data, especially for the bioenergetic part (e.g., the proposed interaction with CI components, which might otherwise appear implausible to readers).

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

      We thank the reviewers for their thoughtful and constructive feedback. We appreciate their recognition of our work’s value and the improvements made in this revised version.

      We are particularly grateful to Reviewer 3 for their detailed and insightful comments, which identified errors we (and other reviewers) had unfortunately overlooked. To address these concerns and ensure the manuscript meets the high standards of clarity and rigor we aim for, we have made additional corrections and refinements.

      As part of this process, we conducted a thorough review of the original source files. This was especially important given that the project spanned from 2018 to 2025, and many co-authors have since left their previous positions.

      We appreciate the opportunity to resubmit this manuscript and are confident that these updates fully address the concerns raised by the reviewer and the editorial team.

      Reviewer #3 (Recommendations for the authors):

      (1) I still do not see the data in WB 2G reflecting the quantification in 2H and 2I. Moreover, the authors state they performed 1 additional experiment, but it appears not to have been included in the analysis of 2H and 2I since the graphs remained the same from the last version of the manuscript.

      We apologize for this oversight. The additional experiment has now been incorporated into the analysis for Figures 2H and 2I, and the graphs have been updated accordingly. While we had uploaded the new blot, we inadvertently forgot to update the analysis graphs. Thank you for bringing this to our attention.

      (2) The authors talk several times about "supercomplexes 1 and 2" without testing their precise composition (there is a ton of literature about SC species in several mouse cell types, and separate BN-PAGE immunoblotting of individual MRC complexes would precisely define them in this context)

      We agree with the reviewer that this is an important point. However, structural differences between supercomplexes were outside the scope of this paper, and we did not perform such analyses. That said, examining the precise composition of supercomplexes could be a valuable direction for future work.

      (3) Steady-state levels of MRC subunits do not match the observations from BN-PAGE results. That might be potentially interpreted and explained by the possible accumulation of intermediates but this is not explored.

      We appreciate the reviewer’s observation. There is indeed a strong possibility that differences in the expression of structural components of mitochondrial complexes exist between WT and Miro1 -/- cells. However, in this study, we chose to focus on assessing potential differences in the enzymatic activities of the complexes rather than examining their structural composition. Exploring the accumulation of intermediates and structural differences could be an interesting avenue for future investigations.

      (4) Citrate synthase normalization of kinetic enzyme activities is claimed, yet it is not shown in any graph and no description of the method is provided.

      We sincerely thank the reviewer for pointing out this discrepancy. Upon careful review, we realized that our statement regarding citrate synthase normalization of kinetic enzyme activities in the last revised version was made in error. This was a miscommunication between co-authors, and we did not perform citrate synthase normalization. Instead, the normalization was performed against protein concentration, determined by the BCA assay as described in the manuscript. We regret this oversight and appreciate the opportunity to clarify this.

      (5) Complex I activity is still wrongfully described as NADPH oxidation in the methods

      We corrected this error.

      (6) The authors state 'Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV'. I do not understand this, I find this justification insufficient and not substantiated by any experimental evidence. What buffer has been used for isolation? There are hundreds of protocols for isolation of intact mitochondria and MRC complexes. Also, DDM and digitonin are the gold-standard detergents for MRC complexes isolation and separation via BN-PAGE.

      We thank the reviewer for raising this important point. We have revised the response to clarify the exact experimental conditions and to provide supporting data.

      For BN-PAGE, mitochondrial fractions purified from cultured VSMCs or aortic tissue were prepared using a standard protocol (now explicitly detailed in the Methods). Briefly, mitochondria were resuspended in 6-aminocaproic acid (ACA) buffer containing 750 mM ACA, 50 mM Bis-Tris (pH 7.0), and protease inhibitors. Forty micrograms of mitochondrial protein were solubilized with 1.5% digitonin, using a final detergent-to-protein ratio of 8:1, and incubated on ice for 20 minutes prior to clarification by centrifugation at 16,000 g for 30 minutes at 4°C. Thus, consistent with established standards, digitonin—one of the gold-standard detergents for MRC complex solubilization and BN-PAGE—was used throughout.

      Despite using these widely accepted conditions, we found that detection of fully assembled Complex IV by BN-PAGE was inconsistent, a limitation that has been reported by others and is known to be sensitive to mitochondrial source, tissue type, and solubilization efficiency. To address this directly and avoid over-interpretation, we assessed Complex IV integrity by examining core subunits. As shown in Figure 6—figure supplement 1 (panels B and C), expression levels of MTCO1 and MTCO2, both essential core components of Complex IV, do not differ significantly between WT and Miro1-/- cells, supporting the conclusion that Complex IV abundance is not altered.

      We have revised the manuscript to clarify these methodological details and to explicitly state that conclusions regarding Complex IV are based on subunit analysis rather than BN-PAGE visualization alone.

      (7) Complex V IGA also does not seem to reflect its quantification.

      Thank you for highlighting this concern. To address it, we will include the numerical data alongside the figures to ensure clarity and alignment with our findings. We hope this will provide a more comprehensive understanding and resolve any ambiguity.

      (8) Figure 6 supplement 1, the authors state 'we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants'. I do not understand, what background is being used? what mutants are being expressed? all the figures refer to Miro1 -/- which is, according to standard genetic nomenclature, a loss-of-function allele (KO).

      Thank you for your comment. To clarify, we first infected MIRO1fl/fl VSMCs with an adenovirus expressing the DNA recombinase Cre or a control adenovirus. Cells infected with the adenovirus expressing Cre are labeled as MIRO1-/- cells. In these MIRO1-/- cells, we then introduced MIRO1 wild type (WT) and MIRO1 mutants via adenoviral expression.

      The mutants include one lacking the transmembrane domain (MIRO1-ΔTM), and another in which the two EF hands of MIRO1 were point-mutated (MIRO1-KK). MIRO1-WT is denoted as Ad WT, the mutant MIRO1-KK as Ad KK, and MIRO1-ΔTM as Ad ΔTM in the figures. We hope this explanation clarifies the experimental background and nomenclature used.

      (9) Figure 6 supplement 1B, no normalization is provided (e.g. VDAC, TOM20 etc.). Interestingly, VDAC is then used to normalize the data in C-D-E-F-G. Also, why is MIRO1 detected in lane 4? Is the mutant stable or not? There is zero signal in A.

      Thank you very much for pointing out that the immunoblot for VDAC1 was missing in Figure 6—Supplement 1B. This figure has been reviewed several times, and unfortunately, this error was not detected. We sincerely apologize for this oversight. We have now revised the figure to include the immunoblot for VDAC1 to address this issue.

      Regarding the detection of MIRO1 in lane 4, we confirm that the "mutant" is not stable. To generate MIRO1 knockout cells, aortic smooth muscle cells from MIRO1fl/fl mice were isolated and cultured, followed by infection with an adenovirus expressing Cre. As these are primary cells and the deletion was induced by Cre expression, the recombination efficiency can vary, which is reflected in the variability observed in lanes 2 and 4 of the immunoblot.

      (10) Why are COX4 levels so low in the 2nd replicate in 7A? the authors 'We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (see image below)'. I could not find the image.

      Thank you for your comment. The second pair of samples in Figure 7A is from a different preparation of mitochondria. In our experimental design, a control sample and a MIRO1 knockdown sample were processed side by side and run next to each other on the immunoblot.

      Regarding the anti-VDAC immunoblot, the image was included in our response to reviewers during the previous revision, as we did not believe it altered the message conveyed by the COX4 blot. However, to ensure clarity and address your concern, we have now included the anti-VDAC immunoblot directly in the figure. We hope this addition resolves any ambiguity and provides further confidence in the data presented.

      (11) The proposed interaction between MIRO1 and NDUFA9 is very difficult to reconcile, as the two proteins reside in distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane (OMM), with its functional domains facing the cytosol, whereas NDUFA9 is a matrix-facing accessory subunit of mitochondrial Complex I, positioned at the interface between the N- and Q-modules.

      We appreciate the reviewer’s comment and agree that MIRO1 and NDUFA9 occupy distinct mitochondrial compartments. MIRO1 is anchored to the outer mitochondrial membrane with cytosol-facing domains, whereas NDUFA9 is a matrix-facing accessory subunit of Complex I at the N/Q-module interface.

      Our data do not suggest a stable, constitutive interaction within intact mitochondria. Rather, the observed association likely reflects an indirect, transient, or context-dependent interaction, potentially occurring during mitochondrial stress, remodeling, or turnover. Such associations may be mediated by multi-protein complexes spanning mitochondrial membranes, dynamic contact sites, or post-lysis interactions detected under experimental conditions. Increasing evidence supports functional coupling between outer mitochondrial membrane proteins and inner membrane or matrix pathways without direct physical binding.

      Additional comments:

      (12) All the raw data should be provided to the readers (uncropped and annotated WB, IHC images, numerical data with statistics applied).

      We agree with the reviewer and appreciate the emphasis on transparency. In accordance with eLife submission requirements, we have provided all raw data. The Source Data files associated with each figure now include uncropped and annotated immunoblots, as well as the numerical source data for all quantified analyses.

      During the compilation of these materials, we were unable to locate the original source files for Figure 2A. The control experiment depicted in the previous version, which demonstrates in vitro recombination, was performed in 2018. However, this experiment was repeated several times throughout the project. Therefore, to ensure the manuscript remains complete, we have replaced this panel with a representative immunoblot from a similar experiment. Additionally, during our review, we discovered a labeling error in Figure 3D and G. We have corrected these figures to ensure accuracy.

      All source files have been provided and carefully labeled to facilitate independent evaluation.

    1. eLife Assessment

      This study provides valuable insights into how HIV-1 Env modulates the nanoscale organization and dynamics of the CXCR4 co-receptor on T cells, using quantitative imaging and functional approaches, the authors present convincing evidence that gp120 engagement promotes CD4-dependent clustering and altered mobility of CXCR4, distinct from the effects of the natural ligand CXCL12. Some concerns were raised regarding the interpretation of the single-particle tracking analyses, and additional clarification or analysis may help strengthen the conclusions. The physiological relevance of the findings could be further enhanced by validation with infectious virus and by more clearly integrating the CXCR4R334X mutant observations into the central mechanistic narrative. The work will be of interest to researchers studying HIV entry and membrane receptor organization.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      This article provides new insights into the organisational changes of the X4-tropic HIV-1 co-receptor CXCR4 upon binding of the viral receptor-binding protein X4-gp120, either in its soluble form or when displayed as Env on virus-like particles (VLPs). The study employs single-particle tracking total internal reflection fluorescence (SPT-TIRF) microscopy to quantify the dynamics and clustering of CXCR4 on CD4+ T cells. The data show that CXCR4 clusters in the presence of X4-gp120 and VLPs, a phenomenon that is also observed for the primary HIV-1 receptor CD4. The authors also show that a WHIM mutant of CXCR4 (CXCR4-R334X) that does not cluster in the presence of its natural ligand, CXCL12, clusters in the presence of X4-gp120 and VLPs.

      Major strengths:

      The data are well presented, discussed, and supported by solid evidence. Literature is cited appropriately.

      Major weaknesses:

      The authors have addressed my concerns in the revised manuscript.

      Significance:

      In summary, the work is presented in a clear fashion, and the main findings are properly highlighted. The paper will be of interest to the broader virology community as well as to researchers studying cell receptor clustering. The findings are not entirely surprising because it has been shown previously that the binding of Env to CD4 mediates CD4 clustering, which would also suggest clustering of the co-receptor. Nonetheless, the paper provides strong evidence that CXCR4 clusters and changes its dynamics in the presence of CD4 and X4-gp120. Moreover, the evidence that X4-gp120 clusters CXCR4-R334X is of high interest as it suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, raising questions for further research.

    3. Reviewer #2 (Public review):

      Summary:

      The author investigates how the HIV-1 Env glycoprotein modulates the nanoscale organisation and dynamics of the CXCR4 co-receptor on CD4⁺ T cells. The author demonstrates that HIV-1 Env induces CXCR4 clustering distinct from that triggered by its natural ligand (CXCL12), implicating spatial receptor organization as a determinant of infection. This study investigates how HIV-1 Env (specifically X4-tropic gp120) alters the membrane organization and dynamics of the chemokine receptor CXCR4 and its WHIM-associated mutant, CXCR4R334X, in a CD4-dependent manner. Using single-particle tracking total internal reflection fluorescence microscopy (SPT-TIRF-M), the authors demonstrate that both soluble gp120 and virus-like particles (VLPs) displaying gp120 induce CXCR4 nanoclustering, reduce receptor diffusivity, and promote immobile nanoclusters of CXCR4 at the membrane of Jurkat T cells and primary CD4⁺ T cell blasts. The work offers new insights into the spatial organisation of receptors during HIV-1 entry and infection. The manuscript is well-written, and the findings are significant.

      Significance:

      Nature and significance of the advance:<br /> This work marks a conceptual and mechanistic breakthrough in understanding HIV-1 entry. It goes beyond the static view of Env-co-receptor interaction to show that nanoscale reorganization of CXCR4, distinct from chemokine-induced clustering, occurs during HIV-1 Env engagement and may be essential for infection.

      Context within existing literature. Previous studies established Env-induced CD4 clustering (Yin et al., 2020) and chemokine-induced CXCR4 nanocluster formation (Martínez-Muñoz et al., 2018), but the exact nanoscale rearrangement of CXCR4 in the context of HIV-1 Env and physiological Env densities remains unquantified. This study addresses this gap using SPT-TIRF, STED microscopy, and functional assays.

      Audience and influence: The findings will be of interest to researchers in HIV virology, membrane receptor biology, viral entry mechanisms, and therapeutic target development. The receptor-clustering aspect could also influence broader fields of study, such as GPCR organization and immune receptor signalling.

      Reviewer expertise: I can evaluate HIV-1 entry mechanisms, viral glycoprotein-host-host-host receptor interactions, single-molecule fluorescence microscopy, and membrane protein dynamics. I am less equipped to evaluate the deep structural modelling aspects, though the in silico AlphaFold results are straightforward to interpret in context.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigate how HIV-1 Env engagement affects the nanoscale organization and dynamics of the CXCR4 coreceptor on target cells. Using single-particle tracking TIRF microscopy, they analyze CXCR4 distribution following exposure to gp120 or HIV virus-like particles, including both wild-type CXCR4 and the WHIM-associated CXCR4.R334X variant. The study further examines the role of CD4-CXCR4 heterodimerization and contrasts Env-induced receptor organization with that elicited by the natural ligand CXCL12.

      Evaluation:

      A major strength of this work is the integration of high-resolution imaging with functional and comparative analyses that distinguish Env-induced CXCR4 clustering from chemokine-driven effects. The experiments are clearly described, include appropriate controls, and are supported by quantitative analyses that are consistent across experiments. The revised manuscript appears to have addressed many of the technical and interpretive issues raised during initial review, improving clarity around data analysis and strengthening confidence in the conclusions.

      I am not an expert in TIRF microscopy or single-molecule tracking and defer to other reviewers regarding limits of imaging and tracking methods. However, I did not identify major inconsistencies between the biological data presented and the conclusions drawn.

      The authors data support the conclusion that HIV-1 Env, delivered as gp120 or virus-like particles, promotes CD4-dependent nanoscale clustering of CXCR4, including the CXCR4.R334X variant associated with WHIM syndrome, in a manner distinct from CXCL12-induced receptor organization. The authors are generally careful to frame their conclusions in proportion to the evidence and avoid overinterpretation.

      Overall, this study builds on prior work on CXCR4 distribution and HIV entry by providing higher-resolution insight into receptor nanoclustering and its modulation by Env. The findings provide a mechanistic refinement rather than a conceptual paradigm shift but is a valuable dataset useful to researchers studying HIV entry, coreceptor biology, and membrane receptor organization.

      Reviewer expertise: HIV-1 Envelope glycoproteins and entry assays, HIV broadly neutralizing antibodies, HIV vaccine design

      Comments on revised version:

      This reviewer has no further recommendations and thanks the authors for clarifying that the Env content in gp120-VLPs was lower than the NL4-3deltaIN particles but that the percentage of mature particles in the gp120-VLPs was higher.

    5. Reviewer #4 (Public review):

      Summary:

      The authors investigate the impact of surface bound HIV gp120 and VLPs on CXCR4 dynamics in Jurkat T cells expressing WT or WHIM syndrome mutated CXCR4, which has a defective response to CXCL12. Jurkat cells were transfected with CXCR4-AcGFP. Images were acquired and a single particle tracking routine was applied to generate information about nanoclustering and diffusion, and FRET was used to investigate CD4-CXCR4 proximity. They compare effects of soluble gp120 to immature and mature VLPs, which include varying degrees of gp120 clustering. They find that solid phase gp120 or VLP can increase CXCR4 clustering size and decrease diffusion in Jurkat cells. Surprisingly, VLP lacking gp120 could increase CXCR4 clustering and speed, which is paradoxical as there were no known ligands on the VLPs, but they likely carry many cellular proteins with potential interactions. The impact of CXCL12 and gp120 binding to CXCR4 was different in terms of clustering and receptor down-regulation.

      Significance:

      The strengths are that it's an important question and the reagents are well prepared and characterised. They are detecting quantitative effects that will likely be reproducible. The information generated is potentially useful for those studying HIV infection processes and strategies to prevent infection.

      The major weakness is that the conditions for the SPT experiments are not ideal in that the density of particles is too high for SPT and the single molecule basis for assessing nanoclusters is not clear. This means that the data is getting at complex molecules phenomena and less likely be generating pure single molecules measurements.

      Comments on revised version:

      The authors should make the tracking data available and this will aid others in following up on it.

    6. Author response:

      Point-by-point description of the revisions

      Reviewer #1:

      Thank you very much for considering that our manuscript evaluates an important question and that the reagents used are well prepared and characterized. We also much appreciate that you consider the information generated as potentially useful for those studying HIV infection processes and strategies to prevent infection.

      (1) While a single particle tracking routine was applied to the data, it's not clear how the signal from a single GFP was defined and if movement during the 100 ms acquisition time impacts this. My concern would be that the routine is tracking fluctuations, and these are related to single particle dynamics, it appears from the movies that the density or the GFP tagged receptors in the cells is too high to allow clear tracking of single molecules. SPT with GFP is very difficult due to bleaching and relatively low quantum yield. Current efforts in this direction that are more successful include using SNAP tags with very photostable organic fluorophores. The data likely does mean something is happening with the receptor, but they need to be more conservative about the interpretation.

      Some of the paradoxical effects might be better understood through deeper analysis of the SPT data, particularly investigation of active transport and more detailed analysis of "immobile" objects. Comments on early figures illustrate how this could be approached. This would require selecting acquisitions where the GFP density is low enough for SPT and performing a more detailed analysis, but this may be difficult to do with GFP.

      When the authors discuss clusters of <2 or >3, how do they calibrate the value of GFP and the impact of diffusion on the measurement. One way to approach this might be single molecules measurements of dilute samples on glass vs in a supported lipid bilayer to map the streams of true immobility to diffusion at >1 µm2/sec.

      We fully understand the reviewer’s apprehensions regarding the application of these high-end biophysical techniques, in particular the associated complexity of the data analysis. We provide below extensive explanations on our methodology, which we hope will satisfactorily address all of the reviewer’s concerns.

      We would first like to emphasize that the experimental conditions and the quantitative analysis used in our current experiments are similar to the established protocols and methodologies applied by our group previously (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022; Gardeta et al. Frontiers in Immunol., 2022; García-Cuesta et al. eLife, 2024; Gardeta et al. Cell. Commun. Signal., 2025) and by others (Calebiro et al. PNAS, 2013; Jaqaman et al. Cell, 2011; Mattila et al. Immunity, 2013; Torreno-Pina et al. PNAS, 2014; Torreno-Pina et al. PNAS, 2016).

      As SPT (single-particle tracking) experiments require low-expressing conditions in order to follow individual trajectories (Manzo & García-Parajo Rep. Prog. Phys., 2015), we transiently transfected Jurkat CD4<sup>+</sup> cells with CXCR4-AcGFP or CXCR4<sup>R334X</sup>-AcGFP. At 24 h post-transfection, cells expressing low CXCR4-AcGFP levels were selected by a MoFlo Astrios Cell Sorter (BeckmanCoulter) to ensure optimal conditions for SPT. Using Dako Qifikit (DakoCytomation), we quantified the number of CXCR4 receptors and found ~8,500 – 22,000 CXCR4-AcGFP receptors/cell, which correspond to a particle density ~2 – 4.5 particles/µm<sup>2</sup> (Author response image 1) and are similar to the expression levels found in primary human lymphocytes.

      Author response image 1.

      Purified AcGFP monomeric protein was immobilized on glass at various concentrations. Dependency of the distribution of particle components on particle density was calculated; >95% were monomeric single particles at 2.0-4.5 particles/µm<sup>2</sup>. This range of particle density was used to analyze the dynamics of CXCR4-AcGFP, or CXCR4<sup>R334X</sup>-AcGFP single particles on JKCD4 cells.

      These cells were resuspended in RPMI supplemented with 2% FBS, NaPyr and L-glutamine and plated on 96-well plates for at least 2 h. Cells were centrifuged and resuspended in a buffer with HBSS, 25 mM HEPES, 2% FBS (pH 7.3) and plated on glass-bottomed microwell dishes (MatTek Corp.) coated with fibronectin (FN) (Sigma-Aldrich, 20 µg/ml, 1 h, 37°C). To observe the effect of the ligand, we coated dishes with FN + CXCL12; FN + X4-gp120 or FN + VLPs, as described in material and methods; cells were incubated (20 min, 37°C, 5% CO<sub>2</sub>) before image acquisition.

      For SPT measurements, we use a total internal reflection fluorescence (TIRF) microscope (Leica AM TIRF inverted) equipped with an EM-CCD camera (Andor DU 885-CS0-#10-VP), a 100x oilimmersion objective (HCX PL APO 100x/1.46 NA) and a 488-nm diode laser. The microscope was equipped with incubator and temperature control units; experiments were performed at 37°C with 5% CO<sub>2</sub>. To minimize photobleaching effects before image acquisition, cells were located and focused using the bright field, and a fine focus adjustment in TIRF mode was made at 5% laser power, an intensity insufficient for single-particle detection that ensures negligible photobleaching. Image sequences of individual particles (500 frames) were acquired at 49% laser power with a frame rate of 10 Hz (100 ms/frame). The penetration depth of the evanescent field used was 90 nm.

      We performed automatic tracking of individual particles using a very well established and common algorithm first described by Jaqaman (Jaqaman et al. Nat. Methods, 2008). Nevertheless, we would stress that we implemented this algorithm in a supervised fashion, i.e., we visually inspect each individual trajectory reconstruction in a separate window. Indeed, this algorithm is not able to quantify merging or splitting events.

      We follow each individual fluorescence spot frame-by-frame using a three-by-three matrix around the centroid position of the spot, as it diffuses on the cell membrane. To minimize the effect of photon fluctuations, we averaged the intensity over 20 frames. Nevertheless, to assure the reviewer that most of the single molecule traces last for at least 50 frames (i.e., 5 seconds), we provide the following data and arguments. We currently measure the photobleaching times from individual CD86-AcGFP spots exclusively having one single photobleaching step to guarantee that we are looking at individual CD86-AcGFP molecules. The distribution of the photobleaching times is shown below (Author response image 2). Fitting of the distribution to a single exponential decay renders a t0 value of ~5 s. Thus, with 20 frames averaging, we are essentially measuring the whole population of monomers in our experiments. As the survival time of a molecule before photobleaching will strongly depend on the excitation conditions, we used low excitation conditions (2 mW laser power, which corresponds to an excitation power density of ~0.015 kW/cm<sup>2</sup> considering the illumination region) and longer integration times (100 ms/frame) to increase the signal-to-background for single GFP detection while minimizing photobleaching.

      Author response image 2.

      Single molecule photobleaching times measured directly from single molecule trajectories of CD86-AcGFP, considering only traces that exhibit single molecule photobleaching steps. The experimental data are shown in gray bars (n=273 trajectories over 3 independent experiments). The red line corresponds to a single exponential decay fitting of the experimental data, from where t<sub>o</sub> has been extracted.

      To infer the stoichiometry of receptor complexes, we also perform single-step photobleaching analysis of the TIRF trajectories to establish the existence of different populations of monomers, dimers, trimers and nanoclusters and extract their percentage. Some representative trajectories of CXCR4-AcGFP with the number of steps detected are shown in new Supplementary Figure 1.  

      The emitted fluorescence (arbitrary units, a.u.) of each spot in the cells is quantified and normalized to the intensity emitted by monomeric CD86-AcGFP spots that strictly showed a single photobleaching step (Dorsch et al. Nat. Methods, 2009). We have preferred to use CD86-AcGFP in cells rather than AcGFP on glass to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. We have also previously shown pharmacological controls to exclude CXCL12-mediated receptor clustering due to internalization processes (Martinez-Muñoz et al. Mol. Cell, 2018) that, together with the evaluation of single photobleaching steps and intensity histograms, allow us to exclude the presence of vesicles in our data. Thus, the dimers, trimers and nanoclusters found in our data do correspond to CXCR4 molecules on the cell surface. Finally, distribution of monomeric particle intensities, obtained from the photobleaching analysis, was analyzed by Gaussian fitting, rendering a mean value of 980 ± 86 a.u. This value was then used as the monomer reference to estimate the number of receptors per particle in both cases, CXCR4-AcGFP and CXCR4<sup>R334X</sup>-AcGFP (new Supplementary Figure 1).

      (2) I understand that the CXCL12 or gp120 are attached to the substrate with fibronectin for adhesion. I'm less clear how how that VLPs are integrated. Were these added to cells already attached to FN?

      For TIRF-M experiments, cells were adhered to glass-bottomed microwell dishes coated with fibronectin, fibronectin + CXCL12, fibronectin + X4-gp120, or fibronectin + VLPs. As for CXCL12 and X4-gp120, the VLPs were attached to fibronectin taking advantage of electrostatic interactions. To clarify the integration of the VLPs in these assays, we have stained the microwell dishes coated with fibronectin and those coated with fibronectin + VLPs with wheat germ agglutinin (WGA) coupled to Alexa647 (Author response image 3) and evaluated the staining by confocal microscopy. These results indicate the presence of carbohydrates on the VLPs and are, therefore, indicative of the presence of VLPs on the fibronectin layer.

      Author response image 3.

      Representative confocal images of microwell dishes coated with fibronectin ((left panel) or fibronectin + VLPs (right panel)) and stained with wheat germ agglutinin (WGA) coupled to Alexa647. Bar scale 1µm.

      Moreover, it is important to remark that the effect of the VLPs on CXCR4 behavior at the cell surface observed by TIRF-M confirmed that the VLPs remained attached to the substrate during the experiment.

      (3) Fig 1A - The classification of particle tracks into mobile and immobile is overly simplistic description that goes back to bulk FRAP measurements and it not really applicable to single molecule tracking data, where it's rare to see anything that is immobile and alive. An alternative classification strategy uses sub-diffusion, normal diffusion and active diffusion (or active transport) to descriptions and particles can transition between these classes over the tracking period. Fig 1B- this data might be better displayed as histograms showing distributions within the different movement classes.

      In agreement with the reviewer’s commentary, the majority of the particles detected in our TIRFM experiments were indeed mobile. However, we also detected a variable, and biologically appreciable, percentage of immobile particles depending on the experimental condition analyzed (Figure 1A in the main manuscript). To establish a stringent threshold for identifying these immobile particles under our specific experimental conditions, we used purified monomeric AcGFP proteins immobilized on glass coverslips. Our analysis demonstrated that 95% of these immobilized proteins showed a diffusion coefficient £0.0015 µm<sup>2</sup>/s; consequently, this value was established as the cutoff to distinguish immobile from mobile trajectories. While the observation of truly immobile entities in a dynamic, living system is rare, the presence of these particles under our conditions is biologically significant. For instance, the detection of large, immobile receptor nanoclusters at the plasma membrane is entirely consistent with facilitating key cellular processes, such as enabling the robust signaling cascade triggered by ligand binding or promoting the crucial events required for efficient viral entry into the cells.

      Regarding the mobile receptors (defined as those with D<sub>1-4</sub> values exceeding 0.0015 µm<sup>2</sup>/s), we observed distinct diffusion profiles derived from mean square displacement (MSD) plots (Figure V) (Manzo & García-Parajo Rep. Prog. Phys., 2015), which were further classified based on motion, using the moment scaling spectrum (MSS) (Ewers et al. PNAS, 2005). Under all experimental conditions, the majority of mobile particles, ~85%, showed confined diffusion: for example under basal conditions, without ligand addition, ~90% of mobile particles showed confined diffusion, ~8.5% showed Brownian-free diffusion and ~1.5% exhibited directed motion (new Supplementary Figure 5A in the main manuscript). These data have been also included in the revised manuscript to show, in detail, the dynamic parameters of CXCR4.

      Due to the space constraints, it is very difficult to include all the figures generated. However, to ensure comprehensive assessment and transparency (for the purpose of this review), we have included below representative plots of the MSD values as a function of time from individual trajectories, showing different types of motion obtained in our experiments (Author response image 4).

      Author response image 4.

      Representative MSD plots from individual trajectories of CXCR4AcGFP detected by SPT-TIRF in resting JKCD4 cells showing different types of motion: A) confined, B) Brownian/Free, C) direct transport.

      (4) Fig 1C,D - It would be helpful to see a plot of D vs MSI at a single particle level. In comparing C and D I'm surprised there is not a larger difference between CXCL12 and X4-gp120. It would also be very important to see the behaviour of X4-gp120 on the CXCR4 deficient Jurkat that would provide a picture of CD4 diffusion. The CXCR4 nanoclustering related to the X4-gp120 could be dominated by CD4 behaviour.

      As previously described, all analyses were performed under SPT conditions (see previous response to point 1). Figure 1C details the percentage of oligomers (>3 receptors/particle) calibrated using Jurkat CD4<sup>+</sup> cells electroporated with monomeric CD86-AcGFP (Dorsch et al. Nat. Methods, 2009). The monomer value was determined by analyzing photobleaching steps as described in our previous response to point 1.

      In our experiments, we observed a trend towards a higher number of oligomers upon activation with CXCL12 compared with X4-gp120. This trend was further supported by measurements of Mean Spot Intensity. However, the values are also influenced by the number of larger spots, which represents a minor fraction of the total spots detected.

      The differences between the effect triggered by CXCL12 or X4-gp120 might also be attributed to a combination of factors related to differences in ligand concentration, their structure, and even to the technical requirements of TIRF-M. Both ligands are in contact with the substrate (fibronectin) and the specific nature of this interaction may differ between both ligands and influence their accessibility to CXCR4. Moreover, the requirement of the prior binding of gp120 to CD4 before CXCR4 engagement, in contrast to the direct binding of CXCL12 to CXCR4, might also contribute to the differences observed.

      We previously reported that CXCL12-mediated CXCR4 dynamics are modulated by CD4 coexpression (Martinez-Muñoz et al. Mol. Cell, 2018). We have now detected the formation of CD4 heterodimers with both CXCR4 and CXCR4<sup>R334X</sup>, and found that these conformations are influenced by gp120-VLPs. In the present manuscript, we did not focus on CD4 clustering as it has been extensively characterized previously (Barrero-Villar et al. J. Cell Sci., 2009; JiménezBaranda et al. Nat. Cell. Biol., 2007; Yuan et al. Viruses, 2021). Regarding the investigation of the effects of X4-gp120 on CXCR4-deficient Jurkat cells, which would provide a picture of CD4 diffusion, we would note that a previous report has already addressed this issue using single molecule super-resolution imaging, and revealed that CD4 molecules on the cell membrane are predominantly found as individual molecules or small clusters of up to 4 molecules, and that the size and number of these clusters increases upon virus binding or gp120 activation (Yuan et al. Viruses, 2021).

      (5) Fig S1D- This data is really interesting. However, if both the CD4 and the gp120 have his tags they need to be careful as poly-His tags can bind weakly to cells and increasing valency could generate some background. So, they should make the control is fair here. Ideally, using non-his tagged person of sCD4 and gp120 would be needed ideal or they need a His-tagged Fab binding to gp120 that doesn't induce CXCR4 binding.

      New Supplementary Figure 2D shows that X4-gp120 does not bind Daudi cells (these cells do not express CD4) in the absence of soluble CD4. While the reviewer is correct to state that both proteins contain a Histidine Tag, cell binding is only detected if X4-gp120 binds sCD4. Nonetheless, we have included in the revised Supplementary Figure 2D a control showing the negative binding of sCD4 to Daudi cells in the absence of X4-gp120. Altogether, these results confirm that only sCD4/X4-gp120 complexes bind these cells.

      (6) Fig S4- Panel D needs a scale bar. I can't figure out what I'm being shown without this.

      Apologies. A scale bar has been included in this panel (new Supplementary Figure 6D).

      Reviewer #2:

      (1) This study is well described in both the main text and figures. Introduction provides adequate background and cites the literature appropriately. Materials and Methods are detailed. Authors are careful in their interpretations, statistical comparisons, and include necessary controls in each experiment. The Discussion presents a reasonable interpretation of the results. Overall, there are no major weaknesses with this manuscript.

      We very much appreciate the positive comments of the reviewer regarding the broad interest and strength of our work.

      (2) NL4-3deltaIN and immature HIV virions are found to have less associated gp120 relative to wild-type particles. It is not obvious why this is the case for the deltaIN particles or genetically immature particles. Can the authors provide possible explanations? (A prior paper was cited, Chojnacki et al Science, 2012 but can the current authors provide their own interpretation.)

      Our conclusion from the data is actually exactly the opposite. As shown in Figure 2D, the gp120 staining intensity was higher for NL4-3DIN particles (1,786 a.u.) than for gp120-VLPs (1,223 a.u.), indicating lower expression of Env proteins in the latter. Furthermore, analysis of gp120 intensity per particle (Figure 2E) confirmed that gp120-VLPs contained fewer gp120 molecules per particle than NL4-3DIN virions. These levels were comparable with, or even lower than, those observed in primary HIV-1 viruses (Zhu et al. Nature, 2006). This reduction was a direct consequence of the method used to generate the VLPs, as our goal was to produce viral particles with minimal gp120 content to prevent artifacts in receptor clustering that might occur using high levels of Env proteins in the VLPs to activate the receptors.  

      This misunderstanding may arise from the fact that we also compared Gag condensation and Env distribution on the surface of gp120-VLPs with those observed in genetically immature particles and integrase-defective NL4-3ΔIN virions, which served as controls. STED microscopy data revealed differences in Env distribution between gp120-VLPs and NL4-3ΔIN virions, supporting the classification of gp120-VLPs as mature particles (Figure 2 A,B).

      Reviewer #3:

      We thank the reviewer for considering that our work offers new insights into the spatial organization of receptors during HIV-1 entry and infection and that the manuscript is well written, and the findings significant.

      (1) For mechanistic basis of gp120-CXCR4 versus CXCL12-CXCR4 differences. Provide additional structural or biochemical evidence to support the claim that gp120 stabilises a distinct CXCR4 conformation compared to CXCL12. If feasible, include molecular modelling, mutagenesis, or crosslinking experiments to corroborate the proposed conformational differences.

      We appreciate the opportunity to clarify this point. The specific claim that gp120 stabilizes a conformation of CXCR4 that is distinct from the CXCL12-bound state was not explicitly stated in our manuscript, although we agree that our data strongly support this possibility. It is important to consider that CXCL12 binds directly to CXCR4, whereas gp120 requires prior sequential binding to CD4, and its subsequent interaction is with a CXCR4 molecule that is already forming part of the CD4/CXCR4 complex, as demonstrated by our FRET experiments and supported by previous studies (Zaitseva et al. J. Leuk. Biol., 2005; Busillo & Benovic Biochim. Biophys. Acta, 2007; Martínez-Muñoz et al. PNAS, 2014). This difference makes it inherently complex to compare the conformational changes induced by gp120 and CXCL12 on CXCR4.

      However, our findings show that both stimuli induce oligomerization of CXCR4, a phenomenon not observed when mutant CXCR4<sup>R334X</sup> was exposed to the chemokine CXCL12 (García-Cuesta et al. PNAS, 2022).

      (1) CXCL12 induced oligomerization of CXCR4 but did not affect the dynamics of CXCR4<sup>R334X</sup> (Martinez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022). By contrast, X4-gp120 and the corresponding VLPs—which require initial binding to CD4 to engage the chemokine receptor—stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>.

      (2) FRET analysis revealed distinct FRET<sub>50</sub> values for CD4/CXCR4 (2.713) and CD4/CXCR4<sup>R334X</sup> (0.399) complexes, suggesting different conformations for each complex.

      (3) Consistent with previous reports (Balabanian et al. Blood, 2005; Zmajkovicova et al. Front. Immunol., 2024; García-Cuesta et al. PNAS, 2022), the molecular mechanisms activated by CXCL12 are distinct when comparing CXCR4 with CXCR4<sup>R334X</sup>. For instance, CXCL12 induces internalization of CXCR4, but not of mutant CXCR4<sup>R334X</sup>. Conversely, X4-gp120 triggers approximately 25% internalization of both receptors. Similarly, CXCL12 does not promote CD4 internalization in cells co-expressing CXCR4 or CXCR4<sup>R334X</sup>, whereas X4-gp120 does, although CD4 internalization was significantly higher in cells co-expressing CXCR4.

      These findings suggest that CD4 influences the conformation and the oligomerization state of both co-receptors. To further support this hypothesis, we have conducted new in silico molecular modeling of CD4 in complex with either CXCR4 or its mutant CXCR4<sup>R334X</sup> using AlphaFold 3.0 (Abramson et al. Nature, 2024). The server was provided with both sequences, and the interaction between the two molecules for each protein was requested. It produced a number of solutions, which were then analyzed using the software ChimeraX 1.10 (Meng et al. Protein Sci., 2023). CXCR4 and its mutant, CXCR4<sup>R334X</sup> bound to CD4, were superposed using one of the CD4 molecules from each complex, with the aim of comparing the spatial positioning of CD4 molecules when interacting with CXCR4.

      Author response image 5.

      CD4/CXCR4 complexes were superimposed with CD4/CXCR4 complexes (left panel) or CD4/CXCR4<sup>R334X</sup> complexes (right panels). Arrows indicate the CD4 molecule used as reference for the superimposing.

      As illustrated in Author response image 5, the superposition of the CD4/CXCR4 complexes was complete. However, when CD4/CXCR4 complexes were superimposed with CD4/CXCR4<sup>R334X</sup> complexes using the same CD4 molecule as a reference, indicated by an arrow in the figure, a clear structural deviation became evident. The main structural difference detected was the positioning of the CD4 transmembrane domains when interacting with either the wild-type or mutant CXCR4. While in complexes with CXCR4, the angle formed by the lines connecting residues E416 at the C-terminus end of CD4 with N196 in CXCR4 was 12°, for the CXCR4<sup>R334X</sup> complex, this angle increased to 24°, resulting in a distinct orientation of the CD4 extracellular domain (Author response image 6).

      Author response image 6.

      Comparison of the angle between the transmembrane domains of CD4 in CXCR4 WT and WHIM complexes. The angle between residues N196 from one CXCR4 molecule and E416 from the two CD4 dimer molecules was calculated for the CXCR4 WT (12°) and WHIM (24°) complexes to demonstrate the difference in CD4 positioning.

      To further analyze the models obtained, we employed PDBsum software (Laskowski & Thornton Protein Sci., 2021) to predict the CD4/CXCR4 interface residues. Data indicated that at least 50% of the interaction residues differed when the CD4/CXCR4 interaction surface was compared with that of the CD4/CXCR4<sup>R334X</sup> complex (Author response image 7). It is important to note that while some hydrogen bonds were present in both complex models, others were exclusive to one of them. For instance, whereas Cys<sup>394</sup>(CD4)-Tyr<sup>139</sup> and Lys<sup>299</sup>(CD4)-Glu<sup>272</sup> were present in both CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes, the pairs Asn<sup>337</sup>(CD4)-Ser<sup>27</sup>(CXCR4<sup>R334X</sup>) and Lys<sup>325</sup>(CD4)-Asp<sup>26</sup>(CXCR4<sup>R334X</sup>) were only found in CD4/CXCR4<sup>R334X</sup> complexes.

      Author response image 7.

      Interacting residues at the CD4/CXCR4 interface. The panel displays the interface residues from the CXCR4 and CD4 oligomer. CD4 residues labeled with a red sphere show the interacting residues present in both CXCR4-WT and –WHIM hetero- oligomers. The continuous red lines represent a saline bridge, while the blue lines indicate a hydrogen bond and the dashed red lines represent non-bonded interactions. As illustrated in the figure, half of the interacting residues differ between the WT and WHIM models, indicating that the interacting surfaces are also distinct.

      These findings, which are consistent with our FRET results, suggest distinct interaction surfaces between CD4 and the two chemokine receptors. Overall, these results are compatible with differences in the spatial conformation adopted by these complexes.

      (2) For Empty VLP effects on CXCR4 dynamics: Explore potential causes for the observed effects of Envdeficient VLPs. It's valuable to include additional controls such as particles from non-producer cells, lipid composition analysis, or blocking experiments to assess nonspecific interactions.

      As VLPs are complex entities, we thought that the relevant results should be obtained comparing the effects of Env(-) VLPs with gp120-VLPs. Therefore, we would first remark that regardless of the effect of Env(-) VLPs on CXCR4 dynamics, the most evident finding in this study is the strong effect of gp120-VLPs compared with control Env(-) VLPs. Nevertheless, regarding the effect of the Env(-) VLPs compared with medium, we propose several hypotheses. As several virions can be tethered to the cell surface via glycosaminoglycans (GAGs), we hypothesized that VLPs-GAGs interactions might indirectly influence the dynamics of CXCR4 and CXCR4<sup>R334X</sup> at the plasma membrane. Additionally, membrane fluidity is essential for receptor dynamics, therefore VLPs interactions with proteins, lipids or any other component of the cell membrane could also alter receptor behavior. It is well known that lipid rafts participate in the interaction of different viruses with target cells (Nayak & Hu Subcell. Biochem., 2004; Manes et al. Nat. Rev. Immunol., 2003; Rioethmullwer et al. Biochim. Biophys. Acta, 2006) and both the lipid composition and the presence of co-expressed proteins modulate ligand-mediated receptor oligomerization (Gardeta et al. Frontiers in Immunol., 2022; Gardeta et al. Cell. Commun. Signal., 2025). We have thus performed Raster Image Correlation Spectroscopy (RICS) analysis to assess membrane fluidity through membrane diffusion measurements on cells treated with Env(-) VLPs.

      Jurkat cells were labeled with Di-4-ANEPPDHG and seeded on FN and on FN + VLPs prior to analysis by RICS on confocal microscopy. The results indicated no significant differences in membrane diffusion under the treatment tested, thereby discarding an effect of VLPs on overall membrane fluidity (Author response image 8).

      Author response image 8.

      VLPs treatment does not alter cell membrane fluidity. Diffusion values obtained by RICS from JKCD4X4 cells. (n = 3, with at least 10 cells analyzed per experiment and condition; n.s., not significant).

      Nonetheless, these results do not rule out other non-specific interactions of Env(-) VLPs with membrane proteins that could affect receptor dynamics. For instance, it has been reported that Ctype lectin DC-SIGN acts as an efficient docking site for HIV-1 (Cambi et al. J. Cell. Biol., 2004; Wu & KewalRamani Nat. Rev. Immunol., 2006). However, a detailed investigation of these possible mechanisms is beyond the scope of this manuscript.

      (3) For Direct link between clustering and infection efficiency - Test whether disruption of CXCR4 clustering (e.g., using actin cytoskeleton inhibitors, membrane lipid perturbants, or clustering-deficient mutants) alters HIV-1 fusion or infection efficiency.

      Designing experiments using tools that disrupt receptor clustering by interacting with the receptors themselves is difficult and challenging, as these tools bind the receptor and can therefore alter parameters such as its conformation and/or its distribution at the cell membrane, as well as affect some cellular processes such as HIV-1 attachment and cell entry. Moreover, effects on actin polymerization or lipids dynamics can affect not only receptor clustering but also impact on other molecular mechanisms essential for efficient infection.

      Many previous reports have, nonetheless, indirectly correlated receptor clustering with cell infection efficiency. Cholesterol plays a key role in the entry of several viruses. Its depletion in primary cells and cell lines has been shown to confer strong resistance to HIV-1-mediated syncytium formation and infection by both CXCR4- and CCR5-tropic viruses (Liao et al. AIDS Res. Hum. Retroviruses, 2021). Moderate cholesterol depletion also reduces CXCL12-induced CXCR4 oligomerization and alters receptor dynamics (Gardeta et al. Cell. Commun. Signal., 2025). By restricting the lateral diffusion of CD4, sphingomyelinase treatment inhibits HIV-1 fusion (Finnegan et al. J. Virol., 2007). Depletion of sphingomyelins also disrupts CXCL12mediated CXCR4 oligomerization and its lateral diffusion (Gardeta et al. Front Immunol., 2022). Additional reports highlight the role of actin polymerization at the viral entry site, which facilitates clustering of HIV-1 receptors, a crucial step for membrane fusion (Serrano et al. Biol. Cell., 2023). Blockade of actin dynamics by Latrunculin A treatment, a drug that sequesters actin monomers and prevents its polymerization, blocks CXCL12-induced CXCR4 dynamics and oligomerization (Martínez-Muñoz et al. Mol. Cell, 2018).

      Altogether, these findings strongly support our hypothesis of a direct link between CXCR4 clustering and the efficiency of HIV-1 infection.

      (4) CD4/CXCR4 co-endocytosis hypothesis - Support the proposed model with direct evidence from livecell imaging or co-localization experiments during viral entry. Clarification is needed on whether internalization is simultaneous or sequential for CD4 and CXCR4.

      When referring to endocytosis of CD4 and CXCR4, we only hypothesized that HIV-1 might promote the internalization of both receptors either sequentially or simultaneously. The hypothesis was based in several findings:

      a) Previous studies have suggested that HIV-1 glycoproteins can reduce CD4 and CXCR4 levels during HIV-1 entry (Choi et al. Virol. J., 2008; Geleziunas et al. FASEB J, 1994; Hubert et al. Eur. J. Immunol., 1995).

      b) Receptor endocytosis has been proposed as a mechanism for HIV-1 entry (Daecke et al. J. Virol., 2005; Aggarwal et al. Traffick, 2017; Miyauchi et al. Cell, 2009; Carter et al. Virology, 2011).

      c) Our data from cells activated with X4-gp120 demonstrated internalization of CD4 and chemokine receptors, which correlated with HIV-1 infection in PBMCs from WHIM patients and healthy donors.

      d) CD4 and CXCR4 have been shown to co-localize in lipid rafts during HIV-1 infection (Manes et al. EMBO Rep., 2000; Popik et al. J. Virol., 2002)

      e) Our FRET data demonstrated that CD4 and CXCR4 form heterocomplexes and that FRET efficiency increased after gp120-VLPs treatment.

      We agree with the reviewer that further experiments are required to test this hypothesis, however, we believe that this is beyond the scope of the current manuscript.

      Minor Comments:

      (1) The conclusions rely solely on the HXB2 X4-tropic Env. It would strengthen the study to assess whether other X4 or dual-tropic strains induce similar receptor clustering and dynamics.

      The primary goal of our current study was to investigate the dynamics of the co-receptor CXCR4 during HIV-1 infection, motivated by previous reports showing CD4 oligomerization upon HIV1 binding and gp120 stimulation (Yuan et al. Viruses, 2021). We initially used a recombinant X4gp120, a soluble protein that does not fully replicate the functional properties of the native HIV-1 Env. Previous studies have shown that Env consists of gp120 trimers, which redistribute and cluster on the surface of virions following proteolytic Gag cleavage during maturation (Chojnacki et al. Nat. Commun., 2017). An important consideration in receptor oligomerization studies is the concentration of recombinant gp120 used, as it does not accurately reflect the low number of Env trimers present on native HIV-1 particles (Hart et al. J. Histochem. Cytochem., 1993; Zhu et al. Nature, 2006). To address these limitations, we generated virus-like particles (VLPs) containing low levels of X4-gp120 and repeated the dynamic analysis of CXCR4. The use of primary HIV-1 isolates was limited, in this project, to confirm that PBMCs from both healthy donors and WHIM patients were equally susceptible to infection. This result using a primary HIV-1 virus supports the conclusion drawn from our in vitro approaches. We thus believe that although the use of other X4- and dual-tropic strains may complement and reinforce the analysis, it is far beyond the scope of the current manuscript.

      (2) Given the observed clustering effects, it would be valuable to explore whether gp120-induced rearrangements alter epitope exposure to broadly neutralizing antibodies like 17b or 3BNC117. This would help connect the mechanistic insights to therapeutic relevance.

      As 3BNC117, VRC01 and b12 are broadly neutralizing mAbs that recognize conformational epitopes on gp120 (Li et al. J. Virol., 2011; Mata-Fink et al. J. Mol. Biol., 2013), they will struggle to bind the gp120/CD4/CXCR4 complex and therefore may not be ideal for detecting changes within the CD4/CXCR4 complex. The experiment suggested by the reviewer is thus challenging but also very complex. It would require evaluating antibody binding in two experimental conditions, in the absence and in the presence of oligomers. However, our data indicate that receptor oligomerization is promoted by X4-gp120 binding, and the selected antibodies are neutralizing mAbs, so they should block or hinder the binding of gp120 and, consequently, receptor oligomerization. An alternative approach would be to study the neutralizing capacity of these mAbs on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> complexes. Variations in their neutralizing activity could be then extrapolated to distinct gp120 conformations, which in turn may reflect differences between CD4/CXCR4 and CD4/CXCR4<sup>R334X</sup> complexes.

      We thus assessed the ability of the VRC01 and b12, anti-gp120 mAbs, which were available in our laboratory, to neutralize gp120 binding on cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup>. Specifically, increasing concentrations of each antibody were preincubated (60 min, 37ºC) with a fixed amount of X4-gp120 (0.05 µg/ml). The resulting complexes were then incubated with Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> (30 min, 37ºC) and, finally, their binding was analyzed by flow cytometry. Although we did not observe statistically significant differences in the neutralization capacity of b12 or VRC01 for the binding of X4-gp120 depending on the presence of CXCR4 or CXCR4<sup>334X</sup>, we observed a trend for greater concentrations of both mAbs to neutralize X4-gp120 binding in Jurkat CD4/CXCR4 cells than in Jurkat CD4/CXCR4<sup>R334X</sup> cells (Author response image 9).

      Author response image 9.

      Flow cytometry analysis of gp120 binding to Jurkat cells expressing CD4/CXCR4 or CD4/CXCR4<sup>R334X</sup> in the presence of different concentrations of the neutralizing anti-gp120 antibodies b12 (left panel) and VRC01 (right panel). AUC comparison by Welch’s t-test: pvalues 0.2950 and 0.2112 for b12 and VRC01 respectively (n = 2).

      These slight alterations in the neutralizing capacity of b12 and VRC01 mAbs may thus suggest minimal differences in the conformations of gp120 depending of the coreceptor used. We also detected that X4-gp120 and VLPs expressing gp120, which require initial binding to CD4 to engage the chemokine receptor, stabilized oligomers of both CXCR4 and CXCR4<sup>R334X</sup>, but FRET data indicated distinct FRET<sub>50</sub> values between the partners, (2.713) for CD4/CXCR4 and (0.399) for CD4/CXCR4<sup>R334X</sup> (Figure 5A,B in the main manuscript). Moreover, we also detected significantly more CD4 internalization mediated by X4-gp120 in cells co-expressing CD4 and CXCR4 than in those co-expressing CD4 and CXCR4<sup>R334X</sup> (Figure 6 in the main manuscript). Overall these latter data and those included in Author response images 5,6 and 7 indicate distinct conformations within each receptor complexes.

      (3) TIRF imaging limits analysis to the cell substrate interface. It would be useful to clarify whether CXCR4 receptor clustering occurs elsewhere, such as at immunological synapses or during cell-to-cell contact.

      In recent years, chemokine receptor oligomerization has gained significant research interest due to its role in modulating the ability of cells to sense chemoattractant gradients. This molecular organization is now recognized as a critical factor in governing directed cell migration (Martínez-Muñoz et al. Mol. Cell, 2018; García-Cuesta et al. PNAS, 2022, Hauser et al. Immunity, 2016). In addition, advanced imaging techniques such as single-molecule and super-resolution microscopy have been used to investigate the spatial distribution and dynamic behaviour of CXCR4 within the immunological synapse in T cells (Felce et al. Front. Cell Dev. Biol., 2020). Building on these findings, we are currently conducting a project focused on characterizing CXCR4 clustering specifically within this specialized cellular region.

      (4) In LVP experiments, it would be useful to report transduction efficiency (% GFP+ cells) alongside MSI data to relate VLP infectivity with receptor clustering functionally.

      These experiments were designed to validate the functional integrity of the gp120 conformation on the LVPs, confirming their suitability for subsequent TIRF microscopy. Our objective was to establish a robust experimental tool rather than to perform a high-throughput quantification of transduction efficiency. It is for that reason that these experiments were included in new Supplementary Figure S6, which also contains the complete characterization of gp120-VLPs and LVPs. In such experimental conditions, quantifying the percentage of GFP-positive cells relative to the total number of cells plated in each well is very difficult. However, in line with the reviewer’s commentary and as we used the same number of cells in each experimental condition, we have included, in the revised manuscript, a complementary graph illustrating the GFP intensity (arbitrary units) detected in all the wells analyzed (new Supplementary Fig. 6E).

      (5) To ensure that differences in fusion events (Figure 7B) are attributable to target cell receptor properties, consider confirming that effector cells express similar levels of HIV-1 Env. Quantifying gp120 expression by flow cytometry or western blot would rule out the confounding effects of variable Env surface density.

      In these assays (Figure 7B), we used the same effector cells (cells expressing X4-gp120) in both experimental conditions, ensuring that any observed differences should be attributable solely to the target cells, either JKCD4X4 or JKCD4X4<sup>R334X</sup>. For this reason, in Figure 7A we included only the binding of X4-gp120 to the target cells which demonstrated similar levels of the receptors expressed by the cells.

      (6) HIV-mediated receptor downregulation may occur more slowly than ligand-induced internalization. Including a 24-hour time point would help assess whether gp120 induces delayed CD4 or CXCR4 loss beyond the early effects shown and to better capture potential delayed downregulation induced by gp120.

      The reviewer suggests using a 24-hour time point to facilitate detection of receptor internalization. However, such an extended incubation time may introduce some confounding factors, including receptor degradation, recycling and even de novo synthesis, which could affect the interpretation of the results. Under our experimental conditions, we observed that CXCL12 did not trigger CD4 internalization whereas X4-gp120 did. Interestingly, CD4 internalization depended on the coreceptor expressed by the cells.

      (7) Increase label font size in microscopy panels for improved readability.

      Of course; the font size of these panels has been increased in the revised version.

      (8) Consider adding more references on ligand-induced co-endocytosis of CD4 and chemokine receptors during HIV-1 entry.

      We have added more references to support this hypothesis (Toyoda et al. J. Virol., 2015; Venzke et al. J. Virol., 2006; Gobeil et al J. Virol., 2013).

      (9) For Statistical analysis. Biological replicates are adequate, and statistical tests are generally appropriate. For transparency, report n values, exact p-values, and the statistical test used in every figure legend and discussed in the results.

      Thank you for highlighting the importance of transparency in statistical reporting. We confirm that the n values for all experiments have been included in the figure legends. The statistical tests used for each analysis are also clearly indicated in the figure legends, and the interpretation of these results is discussed in detail in the Results section. Furthermore, the Methods section specifies the tests applied and the thresholds for significance, ensuring full transparency regarding our analytical approach.

      In accordance with established conventions in the field, we have utilized categorical significance indicators (e.g., n.s., *, **, ***) within our figures to enhance readability and focus on biological trends. This approach is widely adopted in high-impact literature to prevent visual clutter. However, to ensure full transparency and reproducibility, we have ensured that the underlying statistical tests and thresholds are clearly defined in the respective figure legends and Methods section.

      Reviewer #4:

      We thank the reviewer for considering that this work is presented in a clear fashion, and the main findings are properly highlighted, and for remarking that the paper is of interest to the retrovirology community and possibly to the broader virology community.

      We also agree on the interest that X4-gp120 clusters CXCR4<sup>R334X</sup> suggests a different binding mechanism for X4-gp120 from that of the natural ligand CXCL12, an aspect that we are now evaluating. These data also indicate that WHIM patients can be infected by HIV-1 similarly to healthy people.

      (1) The observation that "empty VLPs" reduce CXCR4 diffusivity is potentially interesting. However, it is not supported by the data owing to insufficient controls. The authors correctly discuss the limitations of that observation in the Discussion section (lines 702-704). However, they overinterpret the observation in the Results section (lines 509-512), suggesting non-specific interactions between empty VLPs, CD4 and CXCR4. I suggest either removing the sentence from the Results section or replacing it with a sentence similar to the one in the Discussion section.

      In accordance with the reviewer`s suggestion, the sentence in the result section has been replaced with one similar to that found in the discussion section. In addition, we have performed Raster Image Correlation Spectroscopy (RICS) analysis using the Di-4-ANEPPDHQ lipid probe to assess membrane fluidity by means of membrane diffusion, and compared the results with those of cells treated with Env(-) VLPs. The results indicated that VLPs did not modulate membrane fluidity (Author response image 8). Nonetheless, these results do not rule out other potential non-specific interactions of the Env(-) VLPs with other components of the cell membrane that might affect receptor dynamics (see our response to point 2 of reviewer #3).

      (2) In the case of the WHIM mutant CXCR4-R334X, the addition of "empty VLPs" did not cause a significant change in the diffusivity of CXCR4-R334X (Figure 4B). This result is in contrast with the addition of empty VLPs to WT CXCR4. However, the authors neither mention nor comment on that result in the results section. Please mention the result in the paper and comment on it in relation to the addition of empty VLPs to WT CXCR4.

      We would remark that the main observation in these experiments should focus on the effect of gp120-VLPs, and the results indicates that gp120-VLPs promoted clustering of CXCR4 and of CXCR4<sup>R334X</sup> and reduced their diffusion at the cell membrane. The Env(- ) VLPs were included as a negative control in the experiments, to compare the data with those obtained using gp120VLPs. However, once we observed some residual effect of the Env(-) VLPs, we decided to give a potential explanation, formulated as a hypothesis, that the Env(-) VLPs modulated membrane fluidity. We have now performed a RICS analysis using Di-4-ANEPPDHQ as a lipid probe (Author response image 9). The results suggest that Env(-) VLPs do not modulate cell membrane fluidity, although we do not rule out other potential interactions with membrane proteins that might alter receptor dynamics. We appreciate the reviewer’s observation and agree that this result can be noted. However, since the main purpose of Figure 4B is to show that gp120-VLPs modulate the dynamics of CXCR4<sup>R334X</sup> rather than to remark that the Env(-) VLPs also have some effects, we consider that a detailed discussion of this specific aspect would detract from the central finding and may dilute the primary narrative of the study.

      Minor comments

      (1) It would be helpful for the reader to combine thematically or experimentally linked figures, e.g., Figures 3 and 4.

      (2) Figures 3 and 4 are very similar. Please unify the colours in them and the order of the panels (e.g. Figure 3 panel A shows diffusivity of CXCR4, while Figure 4 panel A shows MSI of CXCR4-R334X).

      While we considered consolidating Figures 3 and 4, we believe that maintaining them as separate entities enhances conceptual clarity. Since Figure 3 establishes the baseline dynamics for wildtype CXCR4 and Figure 4 details the distinct behavior of the CXCR4<sup>R334X</sup> mutant, keeping them separate allows the reader to fully appreciate the specificities of each system before making a cross-comparison.

      (3) Some parts of the Discussion section could be shortened, moved to the Introduction (e.g., lines 648651), or entirely removed (e.g., lines 633-635 about GPCRs).

      In accordance, the Discussion section has been reorganized and shortened to improve clarity.

      (4) I suggest renaming "empty VLPs" to "Env(−) VLPs" (or similar). The name empty VLPs can mislead the reader into thinking that these are empty vesicles.

      The term empty VLPs has been renamed to Env(−) VLPs throughout the manuscript to more accurately reflect their composition. Many thanks for this suggestion.

      (5) Line 492 - please rephrase "...lower expression of Env..." to "...lower expression of Env or its incorporation into the VLPs...".

      The sentence has been rephrased

      (6) Line 527 - The data on CXCL12 modulating CXCR4-R334X dynamics and clustering are not present in Figure 4 (or any other Figure). Please add them or rephrase the sentence with an appropriate reference. Make clear which results are yours.

      (7) Line 532 - Do the data in the paper really support a model in which CXCL12 binds to CXCR4R334X? If not, please rephrase with an appropriate reference.

      Previous studies support the association of CXCL12 with CXCR4<sup>R334X</sup> (Balabanian et al. Blood, 2005; Hernandez et al. Nat Genet., 2003; Busillo & Benovic Biochim. Biophys. Acta, 2007). In fact, this receptor has been characterized as a gain-of-function variant for this ligand (McDermott et al. J. Cell. Mol. Med., 2011). The revised manuscript now includes these bibliographic references to support this commentary. In any case, our previous data indicate that CXCL12 binding does not affect CXCR4<sup>R334X</sup> dynamics (García-Cuesta et al. PNAS, 2022).

      (8) Line 695 - "...lipid rafts during HIV-1 (missing word?) and their ability to..." During what?

      Many thanks for catching this mistake. The sentence now reads: “Although direct evidence for the internalization of CD4 and CXCR4 as complexes is lacking, their co-localization in lipid rafts during HIV-1 infection (97–99) and their ability to form heterocomplexes (22) strongly suggest they could be endocytosed together.”

    1. eLife Assessment

      This study presents important findings for the understanding of central brain circuits that underlie nociception-induced escape. Using a laser-based nociception assay, chronic neuronal silencing, trans-Tango anatomical tracing, and reference to connectomic data, the authors propose that nociceptive signals (from painless- and trpA1-expressing neurons) converge on a subset of dopaminergic neurons (subsets of PPL1 and PAM), which in turn engage mushroom body output neurons (MBONs) to shape escape latency. However, methods and controls fall short of fully supporting the findings, rendering the evidence incomplete. This study will be of interest to scientists studying nociception and learning and memory circuits.

    2. Reviewer #1 (Public review):

      Summary:

      Yang et al. investigate the central pathways underlying nociceptive responses in Drosophila. The authors employ a behavioral platform they previously developed, which uses laser stimulation to deliver nociceptive stimuli while enabling automated tracking of fly behavior. By combining large-scale behavioral screening with circuit tracing approaches, the study identifies a set of dopaminergic neurons (DANs) and mushroom body output neurons (MBONs) that participate in the transmission of nociceptive signals. Nociceptive escape behavior has generally been regarded as largely reflexive. It is therefore intriguing that the mushroom body, a neural circuit classically associated with learning, is involved in this process. In particular, the recruitment of dopaminergic neurons typically linked to both appetitive and aversive valence is noteworthy and raises interesting questions about how nociceptive information is integrated within the circuits. Overall, the findings are conceptually interesting and may provide useful insights into dissecting the nociceptive escape behavior.

      Strengths:

      The behavioral assay used in this study is high-throughput and appears reproducible. The authors screened a large number of genetic lines, and the behavioral responses were carefully quantified. The trans-Tango tracing results are consistent with the behavioral screening results. And the observation that circuits typically associated with learned behaviors (mushroom body) contribute to a nociceptive escape response, generally considered a hard-wired reflex, is conceptually interesting.

      Weaknesses:

      The use of laser stimulation to induce nociceptive stimuli makes the paradigm difficult to combine with calcium imaging or optogenetic manipulations. As a result, the study lacks functional and temporally precise tests of the proposed circuit mechanisms.

      Several aspects of the Methods section require additional detail:

      (1) How was the behavioral potency level calculated? Since some of the split-GAL4 lines label multiple neurons, and the individual neurons may innervate multiple compartments. It is therefore unclear how a single "behavioral potency level" value was assigned to a compartment.

      (2) Additional details are needed on how velocity was calculated, particularly the time window used for the analysis. In the Kir-silenced condition, the variation in velocity appears smaller than in the control group, which would benefit from clarification.

      (3) Connectome analysis. More details are needed regarding how DAN-MBON connectivity was quantified in Figure 5. For example, were only DAN → MBON connections considered, or were bidirectional connections included?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript aims to identify the central nervous system circuitry, specifically within the mushroom body (MB), that mediates nociception-induced escape behavior in adult Drosophila. The authors provide a detailed map of the neural pathways underlying defensive actions in flies. Overall, the study is technically solid, clearly written, and conceptually<br /> interesting.

      Strengths:

      The authors present compelling evidence by integrating multiple complementary approaches. The ALTOMS laser system enables precise, automated measurement of escape latency, allowing for high-throughput and objective behavioral quantification. Neuronal silencing experiments assess functional necessity and demonstrate that specific dopaminergic neurons (DANs) and mushroom body output neurons (MBONs) are critical for escape behavior. Trans-Tango anatomical mapping further supports the proposed circuit by identifying putative synaptic connections consistent with the authors' model.

      Weaknesses:

      A central limitation of the study is its heavy reliance on chronic Kir2.1-mediated neuronal silencing as the primary functional manipulation. This approach raises concerns about potential developmental compensation and indirect network effects. The authors could strengthen their conclusions by incorporating more temporally precise, reversible silencing strategies, such as recently developed optogenetic- or chemogenetic-based methods.

      In addition, the study relies on the trans-Tango system to identify downstream synaptic partners, which has several inherent limitations. Trans-Tango detects only chemical synapses and cannot reveal electrical coupling. The system may also yield false negatives due to reporter sensitivity, and anatomical labeling alone does not establish functional connectivity in the context of the specific behavior examined.

    4. Reviewer #3 (Public review):

      Summary:

      Yang et al sought to describe central brain circuits that underlie nociception-induced escape in Drosophila using a combination of neurogenetic tools to silence subsets of neurons and to trace their postsynaptic connections. They present interesting data that identify subsets of DANs and MBONs that are required for a jumping response to an aversive stimulus, but not for baseline locomotion, and present a model for linking peripheral nociception to MB- dependent escape behavior.

      Strengths:

      They use an innovative avoidance assay to elicit a robust behavioral response and use trans-tango to identify downstream targets of painless and TrpA1-expressing neurons.

      Weaknesses:

      This reviewer's enthusiasm for the study is lowered due to an incomplete description of methods, methods section, appropriate behavioral controls, immunohistochemistry data, and a complete behavioral screen of DANs and MBONs. Below I list my suggestions, questions, and criticisms.

      (1) Behavioral studies are interesting. The assay is simple, yet innovative. However, there is no power analysis or explanation of how sample sizes were selected. I commend the authors for including a positive control; however, although UAS-controls are present, there are no GAL4-controls included in the study. Given that many of the lines used for behavior are split-GAL4's, it's unclear if the additional transgene influenced behavior. This should be addressed.

      (2) It is also not clear from the methods how the behavior was run and how it was analyzed. Was baseline locomotion recorded before the laser was introduced? I assume this is the case; however, more importantly, how long after the flies were introduced to the arena were baseline recordings collected? How much data was used to calculate velocity? Were the experimenters blind to the conditions they were assessing? More detail in the methods is essential for understanding the data and providing an opportunity to replicate results.

      (3) At times, the authors describe "locomotion velocity" as baseline locomotion, but other times, they describe it as escape velocity (see reference to Figure 1F). The authors should clarify whether escape velocity was calculated.

      (4) Immunohistochemistry: There is a lack of detail regarding a description of the flies used for trans-tango experiments. How many brains were evaluated? Was there variability across brains? Were the flies males or females? This is an important detail as sex could impact the level of expression of the ligand and therefore the results. It is also not clear at what age these flies were dissected and at what temperature they were raised. This can also significantly affect the post-synaptic signal that is measured (see Talay et al 2017).

      (5) Figure 2 shows the overlap of trans-tango and dopamine signal, but there is no signal for the GAL4-line to evaluate the overlap between presynaptic signal and postsynaptic signal. This expression is an important consideration and should be included.

      (6) Expression of the GAL4 lines in the central brain is also important to show because the authors suggest that, because painless and TrpA1 expression does not fully overlap in peripheral tissue, it might converge in the central brain. Does that central brain expression of painless and TrpA1 overlap?

      (7) Further, although the authors clearly label the different dopamine subsets (PPL1, PAL, and PAM), some orientation with regard to where these images were taken would be helpful. I recommend a stack showing the location of the cell bodies and then a zoom in to see the overlap.

      (8) Behavioral data for DANs and MBONSs: I recommend that the authors discuss the results by the neurons that are targeted and not the driver lines. For instance, the authors suggest they get the largest effects for 433B, 434B, and 298B, but all of these lines target very similar neuronal subsets y4>y1y2. It's also not clear why different split-lines were selected. Several of the lines have overlapping expression, and other compartments were not included at all. In order to determine which MBONs and DANs are required for escape behavior, all MBONs and DANs should be included. See Aso et al for a list of recommended lines for behavior based on specificity and intensity.

      (9) Based on trans-tango data, it is not clear why the authors focus exclusively on PPL1 and PAM when PAL, PPM1, 2, 3, and PPL2 also overlap with painless and trpA1. Certainly, PPL1 and PAM DANs innervate the MB, but so do some of the other DANs identified.

      (10) For Figure 5, the titles of A and B are DANs and MBONs, but it is really showing the average jumping response when neurons that innervate MB compartments are silenced. Many DANs and MBONs innervate multiple compartments (PPL1-a`2a2, etc.); thus, if the intention is to identify neural circuits that modulate escape response, the analysis should focus on the neurons, not the MB compartments. I recommend reorganizing this data so it highlights the DANs and MBONs instead of the MB compartments. I also recommend showing error bars for averages and/or raw data and organizing the x-axes so DAN and MBON compartments can be easily compared.

      (11) Lastly, nuance is lost here in the Behavioral Potency Level, given that some of these compartments are over-represented and not adjusted for the strength of expression in different split-GAL4 lines. Aso et al. (2014) recommended specific split-GAL4 lines based on specificity and intensity. Some of the lines that are included in the average Behavioral Potency are not recommended for behavior based on the intensity of expression, which could significantly influence the potency score.

    5. Author Response:

      We sincerely thank the reviewers for their insightful and constructive suggestions on our manuscript. We are encouraged by the positive recognition of our study’s conceptual significance, particularly the involvement of the mushroom body (MB) in nociceptive escape behavior and the utility of our ALTOMS behavioral platform.

      We fully agree with the reviewers’ assessments and have initiated several key revisions, additional experiments, and analytical refinements to strengthen the study.

      Below is a summary of our planned improvements:

      1. Experimental Revisions and Scope Expansion

      To address concerns regarding potential developmental compensation (Reviewers 1 and 2), we are performing new experiments using temporally precise manipulation tools to confirm the acute necessity of the identified circuits. Additionally, responding to Reviewer 3, we are conducting further behavioral assays to include necessary genetic controls (e.g., split-GAL4-only lines) and expanding our screen to cover all major MBON and DAN compartments using standardized lines to ensure a comprehensive functional map.

      2. Analytical Refinements and Methodological Transparency

      We are revising our quantitative and anatomical reporting to address several technical suggestions from all three reviewers. Specifically, we will implement a weighted “Behavioral Potency Level” that accounts for driver-specific expression intensity and specificity. Anatomical clarity will be enhanced by providing presynaptic expression patterns alongside trans-Tango signals and a neuron-centric data model for Figure 5. Furthermore, the Materials and Methods will be updated to explicitly detail habituation protocols, stimulation timing, sample sizes, while incorporating a more nuanced discussion on the limitations of the tracing systems.

      We believe these revisions will significantly enhance the rigor and clarity of our manuscript. We look forward to submitting the revised version upon completion of these supplementary tasks.

    1. eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

    4. Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      (3) System range (3 meters reported) is limited for a BLE device.

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

    5. Author response:

      eLife Assessment

      This work presents a valuable new open-source tool for wirelessly controlling optogenetic stimulation in neuroscience experiments in behaving rodents. Evidence for its potential usefulness in different types of optogenetic experiments is solid, although some details and concerns were viewed as lacking or overlooked (e.g., system latency, battery weight). The work is expected to interest neuroscientists working with optogenetics and neuroengineers developing small-sized integrated devices for rodent experiments.

      We thank the eLife team for taking the time to consider and assess our manuscript. Please find below our provisional author responses accompanying the first version of the Reviewed Preprint.

      We would like to clarify an important error regarding the battery model reported in the manuscript. We mistakenly referred to the CP1254-A3 (1.8 g), whereas the battery used for all devices is the CP9440 A4X (0.8 g).

      Importantly, this correction reduces the total device weight by approximately 1 g compared to the value assumed by Reviewer #3. We believe this directly addresses the concern raised regarding battery weight in both the individual review and the overall eLife assessment.

      We will correct this error in the revised manuscript and clearly report the exact battery model and total device weight.

      For reference, the official VARTA CoinPower catalog is available here:

      https://www.varta-ag.com/fileadmin/varta/industry/downloads/products/lithium-ion-cells/VARTA_CoinPower_EN_digital_221124_A5_6p.pdf

      The battery used in BlueBerry is listed on the last line of page 2.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents a wireless device for closed-loop control of optogenetic stimulation based on behavioral triggers. The authors demonstrate the device through two behavioral experiments in mice, showcasing the device's capabilities and emphasizing open accessibility and using off-the-shelf components.

      Strengths:

      The paper presents a device that is open access and easily reproducible for wireless stimulation in a closed loop based on behavioral triggers. Other strengths of the device include the simultaneous use of multiple devices in parallel and the claimed ease of integration with existing frameworks. The paper shows to behavioral experiments on multiple mice along with some device validation results.

      We thank the reviewer for the statement.

      Weaknesses:

      The main weakness of the presented device lies in the lack of flexibility in stimulation power. For a device that is intended for stimulation only, having to physically change a component on the board to adapt stimulation power is a major downside. Reprogrammable stimulation current is not complex to implement and should really have been included on this device. Another weakness lies in the limited battery life of the device. While using a battery-powered device decreases spatial constraints, allowing for the maze experiment presented in the paper, it also means the lifespan of the device is limited compared to an inductively powered device, limiting its ability for long-term experiments.

      We thank the reviewer for these valuable comments. We did consider implementing programmable control of stimulation power, for example using a digital potentiometer. However, in our current design this approach was not sufficient because the output current supported by typical digital potentiometers is too low for the high-power LEDs used in our system. For this reason, we did not include programmable stimulation current in the present version. We agree that this is a limitation and that further work is needed to identify a suitable solution for adjustable stimulation power, which we plan to pursue in future versions of the device. We will revise the manuscript to make this limitation and future direction clearer.

      We also agree that the use of a battery-powered wireless system introduces an important trade-off. We will revise the manuscript to discuss this limitation more explicitly.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed an elegant, lightweight, open-source system that should be able to be widely disseminated to the community. They have used this system in multiple experimental paradigms and demonstrate its functionality quite elegantly. One of these experiments involves two of three animals in the arena being stimulated, a situation that clearly requires an untethered approach. They have appropriately quantified key system parameters (latency and battery life).

      Strengths:

      The introduction places this work in a broader context. That context includes a number of previous solutions, many of which are smaller or more technically complex. However, I agree with the authors that there is a need for something that is easy for labs to acquire and deploy in terms of both what goes on the head and the broader infrastructure (i.e., not needing complex wireless power delivery approaches).

      The paper does an excellent job of describing the system architecture. And the architecture is good! Their system comprises more than just the bluetooth enabled head-mounted devices - they also have built an interface that allows for TTL triggers that link into existing workflows.

      The key metrics for a device like this are weight, battery life, and latency. The weight is 1.4g, which is appropriate for adult mice; the battery life is ~100 minutes of continuous stimulation, which should be sufficient for many experiments, and the latency is typically less than 30 ms, which is fine for all but the most demanding closed-loop experiments.

      Performance is demonstrated in two experiments, a continuous Y-maze, which elegantly demonstrates how transfected animals learn to sense optogenetic closed-loop stimulation to drive their choice behavior in a way that control-stimulated animals do not. While authors claim that the ~2m diameter apparatus is "large scale", the second behavior more convincingly demonstrates the need for wireless stimulation.

      They used closed-loop monitoring of animal pose to selectively stimulate animals for approaching the tails of a dominant conspecific (based on pre-experimental pairwise assessments). It seems that the original hope was that the increases in following that they observe would result in long-lasting changes in the hierarchy of a cage, but as they report, this was not observed. Critically, their supplementary video demonstrates that they conducted this experiment with two instrumented animals simultaneously. This is a situation where a tether would have been hopelessly tangled within a few moments!

      The online documentation seems complete, and it seems quite possible for other labs to adopt and deploy the system.

      We appreciate the reviewer’s enthusiasm. Thank you.

      Weaknesses:

      The battery life is highly dependent on the stimulation paradigm. It makes sense that the LED is a major component of power consumption. It would have been elegant to measure the total optical energy that can be provided by the system. In addition, Bluetooth transmission is probably a major consumer of power, and receiving may not be "free". Quantifying power as a function of Bluetooth message rates would have been useful.

      We thank the reviewer for this important suggestion. We agree that this is a missing characterization in the current manuscript. In the revised version, we will include a more detailed analysis of the system’s power budget, including the maximum stimulation power supported by the BlueBerry device, the corresponding output currents, and the contribution of the main integrated circuits to overall current consumption.

      Presumably, the major constraint on latency is that the Bluetooth receiver polls at ~10 Hz, resulting in latency blocks of 20+, 30+, or 40+ ms. Why latency is never less than 10 ms is unclear. Could latency be reduced by changing a setting? Having a low-latency option would be very helpful for some experimental situations. Latency is probably the primary weakness of the system.

      In the revised manuscript, we will clarify more explicitly that latency is a key limitation of the current system. We will also further investigate the source of this latency, including whether it can be reduced through additional configuration changes. In addition, we will include comparative latency measurements using different Arduino modules as the central BLE controller for the BlueHub device.

      The programming process sounds quite complicated. It would be nice if they had OTA updates. But described and open source. Similarly, the configuration process (Arduino IDE) seems a bit complex. It would be nice if there were a dedicated cross-platform application.

      We will investigate this matter and provide a simpler install and configuration script to setup both the BlueHub and Blueberry systems.

      It is unclear what the maximum number of devices that could be used without wireless interference is. The base station has two charging stations, but it would have been nice to understand the limits beyond this number.

      Due to the current structure of the ArduinoBLE library used in BlueHub devices, each BlueHub unit can support active communication with up to maximum 3 BlueBerry units. We thank the reviewer for highlighting this point and in the next version of the paper we will clarify this point.

      There is a very nice website for the system, but there is some concern that the code and design files are not archived. Could they be deposited with the paper?

      In the revised submission, we will deposit all code used to program both the BlueHub and BlueBerry devices, together with the Gerber files required for PCB fabrication, alongside the paper.

      Reviewer #3 (Public review):

      Summary:

      This study presents a novel device for wireless control of optogenetic stimulation of the mouse brain, the Blueberry, using Bluetooth Low Energy (BLE) communication for parallel activation of up to 4 devices through an Arduino interface. The authors also present two types of brain implants for light delivery that can be connected to the Blueberry: one using uLEDs for surface cortical stimulation, and another using optical fibers for intra- or sub-cortical implants. The architecture of the system, including electronics, communication, and programming, is thoroughly described. Because the system was especially designed to be integrated with existing software used for neuroscience behavioral experiment for closed-loop experiments, validation of the system is shown on two different scenarios: a learning task in a "infinite" Y-maze, where light delivery at precise locations conditions arm choice for navigation; and a social interaction analysis where 3 animals are simultaneously stimulated in order to alter social dynamics among the group.

      Strengths:

      (1) The full system can be built by individual labs with simple PCB printing, off-the-shelf components, and readily available hardware (Arduino) for widespread dissemination.

      (2) Four headstages can be controlled in parallel for simultaneous experiments with multiple mice.

      (3) Validation across different relevant behavioral tests, demonstrating the potential of integrating Bluberry in closed-loop setups.

      We thank the reviewer for the statement.

      Weaknesses:

      (1) Some details in the manuscript regarding system characterization (latency, battery life, etc) are included only in the supplementary materials.

      As correctly mentioned, in the revised manuscript we will move the necessary quantifications from supplementary section to main section.

      (2) The practical details of integration with other commercial and open-source software used for the closed-loop experiments, which could help third-party researchers interested in using the system, are lacking sufficient detail.

      We will clarify this point more clearly in the revised manuscript.

      (3) System range (3 meters reported) is limited for a BLE device.

      The system range reported is the range considered as reliable communication range. In the revised manuscript we quantify this problem by reporting the Received Signal Strength (RSS) value for multiple BlueBerry devices across varying distances.  

      (4) Light output amplitude is not programmable, limiting the choice of stimulation protocols and LEDs used.

      That is indeed a limitation of our system, we will investigate the feasibility of integrating programmable stimulation protocols in the updated version of BlueBerry device.

      (5) Thermal modeling of the cortical surface stimulator was not performed, and it is unclear if the brain implant for this purpose is within the safety limits.

      We thank the reviewer for this comment. In the revised manuscript, we will clarify that the thermal measurements reported here apply only to the specific superficial implant geometry and stimulation conditions used in this study. Because tissue heating depends strongly on implant design and on parameters such as optical power, pulse width, and stimulation frequency, a general safety statement cannot be made for all possible implant configurations. Since the primary goal of this work is to present the wireless device platform rather than to validate a particular implant design, thermal safety should be evaluated individually for each implant and stimulation paradigm.

      (6) The paper is missing a comparison with other state-of-the-art devices for wireless control of optogenetic stimulation in mice.

      In the revised manuscript, we will include a comparison table summarizing our system alongside currently available wireless optogenetic devices.

    1. eLife Assessment

      The manuscript by Mancl et al. provides important mechanistic insights into the conformational dynamics of Insulin Degrading Enzyme (IDE), a zinc metalloprotease involved in the clearance of amyloid peptides. Supported by a compelling combination of time-resolved cryo-EM, SEC-SAXS, enzymatic assays, and both all-atom and coarse-grained simulations, the study reveals an insulin-induced allosteric transition and transient β-sheet interactions underlying IDE's unfoldase activity, thereby refining our understanding of IDE's functional cycle and offering a structural framework for developing substrate-selective modulators of M16 metalloproteases. The latest round of revisions further improves clarity and presentation by updating structural statistics, correcting minor textual inconsistencies, and refining supplemental materials, fully addressing the remaining reviewer comments.

    2. Reviewer #1 (Public review):

      Summary:

      Mancl et al. present an integrative structural and mechanistic analysis of the human insulin-degrading enzyme (IDE), combining cryo‑EM, time‑resolved cryo‑EM, SEC‑SAXS, enzymatic assays, all-atom molecular dynamics (MD) simulations, and coarse‑grained MD simulations. Their study delineates how IDE undergoes coordinated open-close transitions and interdomain rotations, how these motions relate to its unfoldase and protease activities, and how a single residue, R668, acts as a molecular latch governing these conformational changes. Through expanded structural datasets and computational analyses, the authors propose a mechanistic model for how IDE captures, unfolds, and degrades diverse amyloidogenic substrates such as insulin and Aβ.

      Strengths:

      A major strength of this study is its integration of structural, biophysical, biochemical, and computational approaches. The authors now provide six cryo‑EM structures, including a new time‑resolved O/O state captured 123 ms after substrate mixing, which clarifies the early structural response of IDE to insulin binding. The combination of multibody analysis, 3D variability analysis, all‑atom MD, and coarse‑grained Upside simulations yields a coherent picture in which rotational interdomain motions and charge‑swapping events at the IDE‑N/C interface underpin substrate unfolding and repositioning.

      The identification of R668 as a central determinant of the open-close transition, supported by MD, HDX‑MS data from prior work, SEC‑SAXS, and functional assays on the R668A mutant, represents a significant mechanistic advance. The inclusion of Aβ degradation assays adds biological breadth and supports the conclusion that R668 modulates activity in a substrate‑dependent manner.

      The authors have also substantially improved clarity by reorganizing figures, refining section headers, and adding introductory structural schematics. Taken together, the revised manuscript now provides a rigorous and accessible framework for understanding IDE dynamics and their relevance to amyloid peptide turnover.

      Weaknesses:

      At this stage, remaining limitations are modest and inherent to the system rather than the approach. While the study convincingly demonstrates substrate‑dependent modulation of IDE dynamics, it does not experimentally assess additional endogenous substrates (e.g., amylin, glucagon), which would be needed to fully generalize the role of R668 across the substrate spectrum of IDE. Furthermore, the timescale mismatch between MD simulations and catalytic turnover, which the authors clearly acknowledge, means that correlations between simulated motions and enzymatic kinetics remain inferential. Finally, some flexible cryo‑EM states (particularly O/pO) continue to exhibit moderate local resolution, which constrains atomic interpretation of highly dynamic regions, although this is addressed transparently.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complemented and analyzed in atomic detail by using MD simulation studies. The studies are meticulously conducted and lay the ground for future exploration of the protease structure-function relationship.

      Strengths:

      The manuscript presents a powerful integrative structural biology study that combines high-resolution cryo-EM, particle heterogeneity analysis, time-resolved cryo-EM, multiscale molecular dynamics simulations, SAXS, and biochemical assays to dissect the conformational dynamics of human insulin-degrading enzyme. A major strength is the identification of a previously unappreciated rotational component of IDE-N relative to IDE-C and the discovery of R668 as a molecular latch governing the open-close transition, supported consistently by structural, computational, mutational, and functional data. The work provides a coherent mechanistic framework linking IDE dynamics to substrate unfolding, allostery, and substrate-dependent catalysis, with clear relevance to diabetes and Alzheimer's disease biology.

      Weaknesses:

      Despite its depth, several key mechanistic conclusions-particularly substrate unfolding and the proposed "β-grabbing" mechanism-rely heavily on coarse-grained and all-atom MD simulations rather than direct experimental observation. Cryo-EM density for insulin is limited and heterogeneous, restricting definitive structural interpretation of substrate binding modes. The time-resolved cryo-EM experiment captures only a single dominant state at modest resolution, limiting insight into transient intermediates. In addition, the study focuses primarily on insulin, leaving the generality of the proposed mechanism for other IDE substrates insufficiently tested, and the therapeutic implications remain largely speculative without direct pharmacological modulation data.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. eLife Assessment

      This study provides valuable insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compared male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the analyses provided do not provide sufficient discrimination, as paired TCR chains are not examined, and incompletely support the central claims regarding sex differences in the TCR repertoire and potential autoimmune bias.

    2. Reviewer #2 (Public review):

      Summary

      This study addresses the hypothesis that the higher prevalence of autoimmune diseases in women could result from sex-dependent differences in thymic generation or selection of TCR repertoires. The biological question is important and the dataset is valuable. However, the study has major conceptual and analytical limitations.

      In particular:

      - The conclusions cannot be generalized to autoimmune diseases as a whole, as only type 1 diabetes (T1D) and celiac disease (CeD) antigens were analyzed.<br /> - The central interpretation is not supported by the data, as the observed signal is strongly influenced by TCRs associated with T1D, which shows a male-biased incidence and therefore does not align with the female bias the study aims to explain.

      Strengths

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here. However, the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses.

      Weaknesses

      The authors did not adequately address the central concerns raised in the previous review. As a result, the major issues remain unresolved.

      (1) Generalization to autoimmune diseases is not justified.

      The study aims to explain the higher prevalence of autoimmune diseases in females. The main conclusion is based on enrichment in females of TCRs annotated as autoimmune-associated using database matching.<br /> However, these matches correspond exclusively to TCRs specific for T1D and CeD. This already limits the conclusions to these two diseases and does not justify generalization to autoimmune diseases as a whole.

      (2) Contradiction with epidemiology of T1D which is male-biased

      T1D and CeD have opposite sex biases in European populations. While CeD is more frequent in females (~60%; doi:10.1016/j.cgh.2018.11.013), T1D is more frequent in males (male:female = 1.11 in France; doi:10.1111/dom.70124).<br /> Importantly, T1D constitutes a substantial fraction of the autoimmune-associated dataset (42 out of 48 epitopes; 83 out of 185 TRB sequences). Therefore, the observed signal is strongly influenced by a disease that does not follow the female bias the study aims to explain.

      The authors argue that T1D sex bias varies globally, including female-biased incidence in East Asia and Africa. However, this argument does not resolve the issue, as the cohort analyzed in this study was derived from France, where T1D shows a male-biased incidence. Thus, the interpretation remains inconsistent with the population context of the dataset.

      (3) Lack of disease-level and donor-level resolution

      The authors combine T1D and CeD into a single "autoimmune" category and do not provide per-disease, per-donor or per-epitope distributions, despite explicit reviewer's requests.

      This prevents evaluation of whether the observed signal is driven by:<br /> - a specific disease (T1D or CeD), or<br /> - a small number of donors

      Without this analysis, the conclusions cannot be properly interpreted.

      (4) Use of "polyspecificity" concept is not supported by experimental evidence

      The authors extensively use the concept of "polyspecific TCRs," defined as single-chain CDR3 sequences annotated across databases as recognizing distinct and unrelated antigenic categories. This concept is not supported by experimental evidence (except for a single TCR in Quiniou et al., as acknowledged by the authors).

      In the absence of robust validation, a more parsimonious explanation for such ambiguously annotated TCR chains is the presence of false-positive annotations in public databases (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract) or alternatively, distinct TRA pairing for identical TRB sequences resulting in different specificities.

      The observation that these TCRs have high generation probability is expected, as TCRs found in independent studies are likely to have high generation probability. The interpretation of these sequences as biologically meaningful entities (e.g., a "first line of defense") is therefore speculative and not supported by the data.

      The authors also refer to in silico-generated polyspecific TCRs (ref. to Nature Machine Intelligence). However, such sequences are generated ex vivo and do not undergo thymic selection. A TCR capable of recognizing multiple unrelated foreign antigens would likely also recognize self-antigens and be eliminated during negative selection. Therefore, this argument does not support the biological relevance and in vivo existence of the proposed polyspecific TCR class.

      (5) Insufficient statistical analysis of diversity

      The absence of statistically significant differences in repertoire diversity between sexes (Figure 3), despite an apparent visual trend, may reflect limited sample size and insufficient statistical power rather than a true absence of differences. A more appropriate statistical approach, such as mixed-effects modeling, was requested in the previous review but was not performed.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male or female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. The experiments are heroic, yet do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past databases using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found little overlap of their sequences with these annotated sequences (depending on the individual, ranged from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a generalizable finding in the human population.

      Strengths:

      It is a novel dataset that attempts to understand sex differences in the T cell repertoire in humans. Overall, the methodologies are sound and are the current state-of-the-art. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females. This is an important negative result.

      Weaknesses:

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. This reviewer recognizes the difficulty in obtaining samples for this experiment (which were from deceased donors), and this limitation was appropriately discussed. Their analysis was limited by the current availability of other TCR sequences. These weaknesses were appropriately discussed and considered.

      We thank this reviewer for his appreciation of our work.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      We disagree with the reviewer’s assertion that our findings create a conceptual inconsistency.

      Autoimmune diseases are multifactorial conditions in which multiple biological layers, including thymic selection, peripheral immune regulation, hormonal effects, environmental exposures, and tissue-specific vulnerability, contribute to disease incidence. These layers may influence sex ratios in different directions. Therefore, observing a higher frequency of TCRs annotated as T1D-associated in females does not imply that T1D incidence must also be higher in females.

      Actually, T1D incidence itself is not uniformly male-biased worldwide. Epidemiological analyses (reviewed in Qu and Hakonarson, Diabetes Obes Metab 2025) show that male predominance is mainly observed in high-incidence Northern European populations, whereas in several lowerincidence regions, including parts of East Asia and Africa, the sex ratio is balanced or even femalebiased. Furthermore, another recent study highlights that T1D incidence and prevalence in women and men varies depending on the study period (PMC12544016).

      This heterogeneity indicates that disease incidence reflects context-dependent interactions between genetic load, environmental exposures, and sex-specific biological modifiers. Moreover, biological sex acts as a dynamic modifier of genetic risk and immune function in T1D, influencing central tolerance, peripheral immune activation, and β-cell intrinsic resilience (reviewed in Qu and Hakonarson, 2025). Experimental models further demonstrate estrogenmediated protection of pancreatic β-cells (Kim et al., Biochem Biophys Res Commun 2025), indicating that disease incidence reflects the integration of immune, hormonal, and tissuespecific layers rather than central autoreactive TCR release alone. Sex hormones may exert distinct and sometimes opposing effects on thymic selection and on target-organ vulnerability, while environmental factors such as vitamin D status, infections, and microbiota composition further shape disease expression.

      Importantly, our study does not claim causality, nor does it aim to predict the epidemiology of any specific autoimmune disease. Our conclusions are limited to the observation that sexdependent differences exist in thymic TCR selection.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We agree with the reviewer’s comment. As already stated in the previous revision and the "Data Availability" section of the manuscript, all raw sequencing data have been deposited and are publicly available on NCBI (BioProject PRJNA1379632): https://www.ncbi.nlm.nih.gov/sra/PRJNA1379632.

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.13652796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi: 10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      We acknowledge that currently available antigen-annotated TCR databases remain limited. This reflects the considerable experimental difficulty of defining TCRs’ antigen specificities and is a widely recognized limitation in the field.

      In the curated database used here, the autoimmune-associated entries correspond primarily to type 1 diabetes (T1D) and celiac disease (CeD), two autoimmune contexts for which antigen-specific TCRs have been experimentally characterized. However, focusing on the number of antigens alone does not accurately reflect the breadth of the dataset.

      Specifically, our analysis is based on 48 epitopes and nearly 200 annotated TRB sequences, providing substantially broader antigenic representation than suggested by antigen count alone.

      Author response table 1.

      Importantly, our analytical framework does not attempt to interpret each epitope specificity individually. Instead, we examine whether TCRs annotated as autoimmune-associated are differentially represented between sexes at the level of thymic selection.

      In our dataset we observe a stronger CD8⁺ thymic selection of TCRs annotated as autoimmune- associated in females. We interpret this as evidence that central tolerance mechanisms may contribute to sex-dependent differences in autoreactive repertoire composition, rather than as a determinant of any specific autoimmune disease pathophysiology.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      For multiple sclerosis, the only antigen present in the database is myelin basic protein (MBP). In our thymic repertoire dataset, we could not detect any CDR3 sequence matching MPB annotated CDR3s from the database.

      For rheumatoid arthritis, the database contains only a small number of TRA sequences without corresponding TRB chains. Because our specificity analysis is based on TRBs, these entries could not be used in our analyses.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia(doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.

      In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      We agree with the reviewer that the concept of TCR polyspecificity is complex, controversial and not uniformly defined in the literature.

      For some, polyspecificity refers to the ability of individual TCRs to recognize multiple related peptides sharing structural motifs, as described by Mason, Sewell, Garcia, and others. With this definition, we agree that many/most TCRs exhibit some degree of cross-reactivity and would thus be defined as polyspecific.

      In contrast, our definition of polyspecificity came from our observation arising from large-scale repertoire analyses that certain CDR3 sequences are repeatedly annotated across databases as recognizing distinct and unrelated antigenic categories. In our previous study (Quiniou et al.), we showed that these sequences display specific biochemical and repertoire features and may represent a particular class of TCRs involved in early or heterologous immune responses. A classic cross reactivity based on structural motif sharing could not explain these results.

      We believe that the existence of such TCRs, rather than classic cross-reactive TCRs, has the potential to better explain why patients with extremely reduced TCR repertoires (around 3000 TCRs only) can respond well to various infectious challenges (https://doi.org/10.1073/pnas.97.1.274) or why there are T cells with memory phenotypes against viruses not previously encountered (https://pmc.ncbi.nlm.nih.gov/articles/PMC3626102/ ). We acknowledge that direct experimental validation of the function of such TCRs is currently limited; further work will help clarify the notion of polyspecificity, and hopefully to better understand the overlooked “heterologous immunity”.

      Of note, a recent paper in Nature Machine Intelligence (https://doi.org/10.1038/s42256-02501096-6) described the in-silico generation of antigen-specific TCRs. Using our definition of polyspecificity (TCRs with higher generation probabilities, specific V/J gene preferences, shared CDR3s across individuals, and reactivity to multiple unrelated peptides), they showed that “multitask models preferentially sample polyspecific CDR3β sequences”. Therefore, we consider the debate on polyspecificity to be ongoing, and our discussion of polyspecificity in this paper to be part of this debate.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      In the original manuscript, the comparison to the pooled database was intended as a descriptive assessment rather than as a biological enrichment analysis. Differences between an experimental thymic repertoire and a curated reference database are expected, given the structure and annotation biases inherent to the reference resource.

      The purpose of Supplementary Figures 15B and 15C was therefore twofold: (i) to provide a descriptive overview of how specificity categories are distributed in our thymic dataset relative to the curated database, and (ii) to evaluate whether deviations from database proportions were of similar magnitude in males and females, ensuring that database composition did not differentially bias one sex over the other. In addition, the donor-resolved representations demonstrate that these patterns are consistent across individuals and are not driven by a single donor.

      To avoid any potential misinterpretation, we have revised the manuscript to remove references to “enrichment” relative to database composition and eliminated quantitative comparisons to baseline database frequencies. The corresponding text and figure legends have been clarified to indicate that these analyses are descriptive and methodological in nature, while all biological interpretations rely exclusively on direct sex-specific comparisons within the thymic dataset.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

      The use of “comparable” in comparing TCR repertoire diversity is indeed “soft”, and aimed to indicate that there are no obvious dissimilarities.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) Available HLA typing data for selected donors should be included as a table in the manuscript.

      The available low-resolution HLA typing data for the donors included in this study have been compiled and added as Supplementary Table 1 in the revised manuscript.

      (2) The authors' explanation for why external validation of gene usage biases was not possible should be concisely incorporated into the Discussion.

      We have incorporated a concise explanation in the Discussion clarifying why independent validation of the TRBV6-5 bias in external thymic datasets is currently not feasible, due to the absence of publicly available cohorts combining sorted thymic subsets, balanced sex representation, and sufficient sequencing depth.

      (3) The clarification that considered sex-specific motifs are public should be included explicitly in the main text, not only figure legend and methods.

      We now explicitly state in the main Results section that only public motifs, defined as motifs containing CDR3 sequences shared by at least two individuals, were retained in the analysis.

      (4) The statement "Thymocytes expressing TCRs with insufficient or excessive avidity are eliminated (negative selection)" is strictly speaking incorrect. Thymocytes with insufficient avidity are eliminated by death by neglect during positive selection.

      We thank the reviewer for pointing out this imprecision. The statement has been corrected.

      (5) Figure 8C is unclear - what does "80% of unique polyspecific TCRs" mean? In any case, I strongly recommend removal of all polyspecificity-related analyses.

      We apologize for the lack of clarity in the axis label of Figure 8C. To clarify, this analysis represents the proportion of polyspecific CDR3aa sequences among all sequences with an assigned specificity within an individual’s repertoire. Specifically, it measures how many unique TCR sequences, previously identified as having a known specificity in reference databases, are also categorized as polyspecific.

      To address the reviewer’s concern, we have updated the Y-axis label of Figure 8C to: "Proportion of polyspecific CDR3aa among antigen-specific sequences (%)".

      (6) "However, no significant sex-based differences were found in the usage of hydrophobic, hydrophilic, or neutral aa at the critical p109 and p110 positions in TRB" - this Discussion statement is inconsistent with the new analysis on Fig. 4C.

      We regret that the Discussion still contained wording from a previous version of the analysis. The text has now been corrected to reflect the updated results showing a significant increase in hydrophobic amino acid usage at positions p109/p110.

      (7) In the Discussion the authors write: "the absence of age-related clustering in repertoire features (data not shown)". What is the reasoning for not showing the data?

      We understand the reviewer's point. This exploratory clustering analysis was performed on the data presented in the heatmaps (Figure 2B and Supplemental Figures 10-13). However, as it revealed no distinct patterns or clustering based on the donors' age (with samples from different age groups being interspersed throughout the clusters), we chose not to add an extra layer of annotation to Figure 2B to maintain clarity.

    1. eLife Assessment

      Combining state-of-art in-situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing, this fundamental work substantially advances our understanding of glial contributions to organismal lifespan. The evidence supporting the conclusions is compelling. The work will be of broad interest to researchers studying aging biology, glia-neuron communication and in vivo proteomic profiling.

    2. Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased the cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins. 3

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many project.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C,D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform while flySAM likely would likely express all isoforms. Could this also contribute to the phenotypes observed?

      b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018) likely due to toxicity of such high levels of overexpression. Is it possible that larger increase in lifespan is due to the already reduced viability of these flies?

      c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, some many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the central adult brain, which does not include the nerve cord where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors including learning, circadian rhythms, etc.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      Comments on revisions:

      The authors have conducted additional experiments, updated text/figures, and included discussions to address the concerns raised by the reviewers. I commend the authors on a thorough, rigorous study that will undoubtedly impact the field and spawn many projects for years to come.

      One minor comment: In Figure S2, the figure legend states "A-C"; however, the figure itself only has an A and B.

    3. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. eLife Assessment

      This study presents a valuable finding on how the locus coeruleus modulates the involvement of medial prefrontal cortex in set shifting using calcium imaging in mice. The evidence supporting the claims was viewed as solid in revealing the dynamics and potential mechanisms supporting extradimensional shifts. The work is of broad interest to those studying flexible cognition.