10,000 Matching Annotations
  1. Dec 2024
    1. Reviewer #2 (Public review):

      Summary:

      In this paper, Lin and colleagues aim to understand the role of different salts on the phase behavior of a model protein of significant biological interest, Caprin1, and its phosphorylated variant, pY-Caprin1. To achieve this, the authors employed a variety of methods to complement experimental studies and obtain a molecular-level understanding of ion partitioning inside biomolecular condensates. A simple theory based on rG-RPA is shown to capture the different salt dependencies of Caprin1 and pY-Caprin1 phase separation, demonstrating excellent agreement with experimental results. The application of this theory to multivalent ions reveals many interesting features with the help of multicomponent phase diagrams. Additionally, the use of CG model-based MD simulations and FTS provides further clarity on how counterions can stabilize condensed phases.

      Strengths:

      The greatest strength of this study lies in the integration of various methods to obtain complementary information on thermodynamic phase diagrams and the molecular details of the phase separation process. The authors have also extended their previously proposed theoretical approaches, which should be of significant interest to other researchers. Some of the findings reported in this paper, such as bridging interactions, are likely to inspire new studies using higher-resolution atomistic MD simulations.

    2. Reviewer #3 (Public review):

      Authors first use rG-RPA to reproduce two observed trends. Caprin1 does not phase separate at very low salt but then undergoes LLPS with added salt while further addition of salt reduces its propensity to LLPS. On the other hand pY-Caprin1 exhibits a monotonic trend where the propensity to phase separate decreases with the addition of salt. This distinction is captured by a two component model and also when salt ions are explicitly modeled as a separate species with a ternary phase diagram. The predicted ternary diagrams (when co and counter ions are explicitly accounted for) also predict the tendency of ions to co-condense or exclude proteins in the dense phase. Predicted trends are generally in line with the measurement for Cparin1. Next, the authors seek to explain the observed difference in phase separation when Arginines are replaced by Lysines creating different variants. In the current rG-RPA type models both Arginine (R) and Lysine (K) are treated equally since non-electrostatic effects are only modeled in a mean-field manner that can be fitted but not predicted. For this reason, coarse grain MD simulation is suitable. Moreover, MD simulation affords structural features of the condensates. They used a force field that is capable of discriminating R and K. The MD predicted degrees of LLPS of these variants again is consistent with the measurement. One additional insight emerges from MD simulations that a negative ion can form a bridge between two positively charged residues on the chain. These insights are not possible to derive from rG-RPA. Both rG-RPA and MD simulation become cumbersome when considering multiple types of ions such as Na, Cl, [ATP] and [ATP-Mg] all present at the same time. FTS is well suited to handle this complexity. FTS also provides insights into the co-localization of ions and proteins that is consistent with NMR. By using different combinations of ions they confirm the robustness of the prediction that Caprin1 shows salt-dependent reentrant behavior, adding further support that the differential behavior of Caprin1, and pY-Caprin1 is likely to be mediated by charge-charge interactions.

      Comments on revisions:

      The authors addressed my comments and it is ready for publication.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Comments on revisions:

      This revision addressed all my previous comments.

      Reviewer #3 (Public Review):

      Comments on revisions:

      The authors addressed my comments and it is ready for publication.

      We are grateful for the reviewers’ effort and are encouraged by their generally positive assessment of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      This revision addressed all my previous comments. The only new issue concerns the authors’ response to the following comment of reviewer 3:

      (2) Authors note ”monovalent positive salt ions such as Na+ can be attracted, somewhat counterintuitively, into biomolecular condensates scaffolded by positively-charged polyelectrolytic IDRs in the presence of divalent counterions”. This may be due to the fact that the divalent negative counterions present in the dense phase (as seen in the ternary phase diagrams) also recruit a small amount of Na+.

      Author reply: The reviewer’s comment is valid, as a physical explanation for this prediction is called for. Accordingly, the following sentence is added to p. 10, lines 27-29: ...

      Here are my comments on this issue. Most IDPs with a net positive charge still have negatively charged residues, which in theory can bind cations. In fact, Caprin1 has 3 negatively charged residues (same as A1-LCD). All-atom simulations of MacAinsh et al (ref 72) have shown that these negatively charged residues bind Na+; I assume this effect can be captured by the coarsegrained models in the present study. Moreover, all-atom simulations showed that Na+ has a strong tendency to be coordinated by backbone carbonyls, which of course are present on all residues. Suggestions:

      (a) The authors may want to analyze the binding partners of Na+. Are they predominantly the3 negatively charged residues, or divalent counterions, or both?

      (b) The authors may want to discuss the potential underestimation of Na+ inside Caprin1 condensates due to the lack of explicit backbone carbonyls that can coordinate Na+ in their models. A similar problem applies to backbone amides that can coordinate anions, but to a lesser extent (see Fig. 3A of ref 72).

      The reviewer’s comments are well taken. Regarding the statement in the revised manuscript “This phenomenon arises because the positively charge monovalent salt ions are attracted to the negatively charged divalent counterions in the protein-condensed phase.”, it should be first noted that the statement was inferred from the model observation that Na+ is depleted in condensed Caprin1 (Fig. 2a) when the counterion is monovalent (an observation that was stated almost immediately preceding the quoted statement). To make this logical connection clearer as well as to address the reviewer’s point about the presence of negatively charged residues in Caprin1, we have modified this statement in the Version of Record (VOR) as follows:

      “This phenomenon most likely arises from the attraction of the positively charge monovalent salt ions to the negatively charged divalent counterions in the proteincondensed phase because although the three negatively charged D residues in Caprin1 can attract Na+, it is notable that Na+ is depleted in condensed Caprin1 when the counterion is monovalent (Fig. 2a).”

      The reviewer’s suggestion (a) of collecting statistics of Na+ interactions in the Caprin1 condensate is valuable and should be attempted in future studies since it is beyond the scope of the present work. Thus far, our coarse-grained molecular dynamics has considered only monovalent Cl− counterions. We do not have simulation data for divalent counterions.

      Following the reviewer’s suggestion (b), we have now added the following sentence in Discussion under the subheading “Effects of salt on biomolecular LLPS”:

      “In this regard, it should be noted that positively and negatively charged salt ions can also coordinate with backbone carbonyls and amides, respectively, in addition to coordinating with charged amino acid sidechains (MacAinsh et al., eLife 2024). The impact of such effects, which are not considered in the present coarse-grained models, should be ascertained by further investigations using atomic simulations (MacAinsh et al., eLife 2024; Rauscher & Pom`es, eLife 2017; Zheng et al., J Phys Chem B 2020).”

      Here we have added a reference to Rauscher & Pom`es, eLife 2017 to more accurately reflect progress made in atomic simulations of biomolecular condensates.

      More generally, regarding the reviewer’s comments on the merits of coarse-grained versus atomic approaches, we re-emphasize, as stated in our paper, that these approaches are complementary. Atomic approaches undoubtedly afford structurally and energetically high-resolution information. However, as it stands, simulations of the assembly-disassembly process of biomolecular condensate are nonideal because of difficulties in achieving equilibration even for a small model system with < 10 protein chains (MacAinsh et al., eLife 2024) although well-equilibrated simulations are possible for a reasonably-sized system with ∼ 30 chains when the main focus is on the condensed phase (Rauscher & Pom`es, eLife 2017). In this context, coarse-grained models are valuable for assessing the energetic role of salt ions in the thermodynamic stability of biomolecular condensates of physically reasonable sizes under equilibrium conditions.

      In addition to the above minor additions, we have also added citations in the VOR to two highly relevant recent papers: Posey et al., J Am Chem Soc 2024 for salt-dependent biomolecular condensation (mentioned in Dicussion under subheadings “Tielines in protein-salt phase diagrams” and “Counterion valency” together with added references to Hribar et al., J Am Chem Soc 2002 and Nostro & Ninham, Chem Rev 2012 for the Hofmeister phenomena discussed by Posey et al.) and Zhu et al., J Mol Cell Biol 2024 for ATP-modulated reentrant behavior (mentioned in Introduction). We have also added back a reference to our previous work Lin et al., J Mol Liq 2017 to provide more background information for our formulation.

      Reviewer #2 (Recommendations For The Authors):

      The authors have done a great job addressing previous comments.

      We thank this reviewer for his/her effort and are encouraged by the positive assessment of our revised manuscript.

      ---

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors used multiple approaches to study salt effects in liquid-liquid phase separation (LLPS). Results on both wild-type Caprin1 and mutants and on different types of salts contribute to a comprehensive understanding.

      Strengths:

      The main strength of this work is the thoroughness of investigation. This aspect is highlighted by the multiple approaches used in the study, and reinforced by the multiple protein variants and different salts studied.

      We are encouraged by this positive overall assessment.

      Weaknesses: (1) The multiple computational approaches are a strength, but they’re cruder than explicit-solvent all-atom molecular dynamics (MD) simulations and may miss subtle effects of salts. In particular, all-atom MD simulations demonstrate that high salt strengthens pi-types of interactions (ref. 42 and MacAinsh et al, https://www.biorxiv.org/content/10.1101/2024.05.26.596000v3).

      The relative strengths and limitations of coarse-grained vs all-atom simulation are now more prominently discussed beginning at the bottom of p. 5 through the first 8 lines of p. 6 of the revised manuscript (page numbers throughout this letter refer to those in the submitted pdf file of the revised manuscript), with MacAinsh et al. included in this added discussion (cited as ref. 72 in the revised manuscript). The fact that coarse-grained simulation may not provide insights into more subtle structural and energetic effects afforded by all-atom simulations with regard to π-related interaction is now further emphasized on p. 11 (lines 23–30), with reference to MacAinsh et al. as well as original ref. 42 (Krainer et al., now ref. 50 in the revised manuscript).

      (2) The paper can be improved by distilling the various results into a simple set of conclusions. By example, based on salt effects revealed by all-atom MD simulations, MacAinsh et al. presented a sequence-based predictor for classes of salt dependence. Wild-type Caprin1 fits right into the “high net charg”e class, with a high net charge and a high aromatic content, showing no LLPS at 0 NaCl and an increasing tendency of LLPS with increasing NaCl. In contrast, pY-Caprin1 belongs to the “screening” class, with a high level of charged residues and showing a decreasing tendency of LLPS.

      This is a helpful suggestion. We have now added a subsection with heading “Overview of key observations from complementary approaches” at the beginning of the “Results” section on p. 6 (lines 18–37) and the first line of p. 7. In the same vein, a few concise sentences to summarize our key results are added to the first paragraph of “Discussion” (p. 18, lines 23– 26). In particular, the relationship of Caprin1 and pY-Caprin1 with the recent classification by MacAinsh et al. (ref. 72) in terms of “high net charge” and “screening” classes is now also stated, as suggested by this reviewer, on p. 18 under “Discussion” (lines 26–30).

      (3) Mechanistic interpretations can be further simplified or clarified. (i) Reentrant salt effects (e.g., Fig. 4a) are reported but no simple explanation seems to have been provided. Fig. 4a,b look very similar to what has been reported as strong-attraction promotor and weak-attraction suppressor, respectively (ref. 50; see also PMC5928213 Fig. 2d,b). According to the latter two studies, the “reentrant” behavior of a strong-attraction promotor, CL- in the present case, is due to Cl-mediated attraction at low to medium [NaCl] and repulsion between Cl- ions at high salt. Do the authors agree with this explanation? If not, could they provide another simple physical explanation? (ii) The authors attributed the promotional effect of Cl- to counterionbridged interchain contacts, based on a single instance. There is another simple explanation, i.e., neutralization of the net charge on Caprin1. The authors should analyze their simulation results to distinguish net charge neutralization and interchain bridging; see MacAinsh et al.

      The relationship of Cl− in bridging and neutralizing configurations, respectively, with the classification of “strong-attraction promoter” and “weak-attraction suppressor” by Zhou and coworkers is now stated on p. 13 (lines 29–31), with reference to original ref. 50 by Ghosh, Mazarakos & Zhou (now ref. 59 in the revised manuscript) as well as the earlier patchy particle model study PMC5928213 by Nguemaha & Zhou, now cited as ref. 58 in the revised manuscript. After receiving this referee report, we have conducted an extensive survey of our coarse-grained MD data to provide a quantitative description of the prevalence of counterion (Cl−) bridging interactions linking positively charged arginines (Arg+s) on different Caprin1 chains in the condensed phase (using the [Na+] = 0 case as an example). The newly compiled data is reported under a new subsection heading “Explicit-ion MD offers insights into counterion-mediated interchain bridging interactions among condensed Caprin1 molecules” on p. 12 (last five lines)–p. 14 (first 10 lines) [∼ 1_._5 additional page] as well as a new Fig. 6 to depict the statistics of various Arg+–Cl−–Arg+ configurations, with the conclusion that a vast majority (at least 87%) of Cl− counterions in the Caprin1-condensed phase engage in favorable condensation-driving interchain bridging interactions.

      (4) The authors presented ATP-Mg both as a single ion and as two separate ions; there is no explanation of which of the two versions reflects reality. When presenting ATP-Mg as a single ion, it’s as though it forms a salt with Na+. I assume NaCl, ATP, and MgCl2 were used in the experiment. Why is Cl- not considered? Related to this point, it looks like ATP is just another salt ion studied and much of the Results section is on NaCl, so the emphasis of ATP (“Diverse Roles of ATP” in the title is somewhat misleading.

      We model ATP and ATP-Mg both as single-bead ions (in rG-RPA) and also as structurally more realistic short multiple-bead polymers (in field-theoretic simulation, FTS). We have now added discussions to clarify our modeling rationale in using and comparing different models for ATP and ATP-Mg, as follows:

      p. 8 (lines 19–36):

      “The complementary nature of our multiple methodologies allows us to focus sharply on the electrostatic aspects of hydrolysis-independent role of ATP in biomolecular condensation by comparing ATP’s effects with those of simple salt. Here, Caprin1 and pY-Caprin1 are modeled minimally as heteropolymers of charged and neutral beads in rG-RPA and FTS. ATP and ATP-Mg are modeled as simple salts (singlebead ions) in rG-RPA whereas they are modeled with more structural complexity as short charged polymers (multiple-bead chains) in FTS, though the latter models are still highly coarse-grained. Despite this modeling difference, rG-RPA and FTS both rationalize experimentally observed ATP- and NaCl-modulated reentrant LLPS of Caprin1 and a lack of a similar reentrance for pY-Caprin1 as well as a prominent colocalization of ATP with the Caprin1 condensate. Consistently, the same contrasting trends in the effect of NaCl on Caprin1 and pY-Caprin1 are also seen in our coarse-grained MD simulations, though polymer field theories tend to overestimate LLPS propensity [99]. The robustness of the theoretical trends across different modeling platforms underscores electrostatics as a significant component in the diverse roles of ATP in the context of its well-documented ability to modulate biomolecular LLPS via hydrophobic and π-related effects [63, 65, 67].”

      Here, the last sentence quoted above addresses this reviewer’s question about our intended meaning in referring to “diverse roles of ATP” in the title of our paper. To make this point even clearer, we have also added the following sentence to the Abstract (p. 2, lines 12–13):

      “... The electrostatic nature of these features complements ATP’s involvement in π-related interactions and as an amphiphilic hydrotrope, ...”

      Moreover, to enhance readability, we have now added pointers in the rG-RPA part of our paper to anticipate the structurally more complex ATP and ATP-Mg models to be introduced subsequently in the FTS part, as follows:

      p. 9 (lines 13–15):

      “As mentioned above, in the present rG-RPA formulation, (ATP-Mg)<sup>2−</sup> and ATP<sup>4−</sup> are modeled minimally as a single-bead ion. They are represented by charged polymer models with more structural complexity in the FTS models below.”

      p. 11 (lines 8–11):

      These observations from analytical theory will be corroborated by FTS below with the introduction of structurally more realistic models of (ATP-Mg) <sup>2−</sup>, ATP<sup>4−</sup> together with the possibility of simultaneous inclusion of Na<sup>+</sup>, Cl−, and Mg<sup>2+</sup> in the FTS models of Caprin1/pY-Caprin1 LLPS systems.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Lin and colleagues aim to understand the role of different salts on the phase behavior of a model protein of significant biological interest, Caprin1, and its phosphorylated variant, pY-Caprin1. To achieve this, the authors employed a variety of methods to complement experimental studies and obtain a molecular-level understanding of ion partitioning inside biomolecular condensates. A simple theory based on rG-RPA is shown to capture the different salt dependencies of Caprin1 and pY-Caprin1 phase separation, demonstrating excellent agreement with experimental results. The application of this theory to multivalent ions reveals many interesting features with the help of multicomponent phase diagrams. Additionally, the use of CG model-based MD simulations and FTS provides further clarity on how counterions can stabilize condensed phases.

      Strengths:

      The greatest strength of this study lies in the integration of various methods to obtain complementary information on thermodynamic phase diagrams and the molecular details of the phase separation process. The authors have also extended their previously proposed theoretical approaches, which should be of significant interest to other researchers. Some of the findings reported in this paper, such as bridging interactions, are likely to inspire new studies using higher-resolution atomistic MD simulations.

      Weaknesses:

      The paper does not have any major issues.

      We are very encouraged by this reviewer’s positive assessment of our work.

      Reviewer #3 (Public Review):

      Authors first use rG-RPA to reproduce two observed trends. Caprin1 does not phase separate at very low salt but then undergoes LLPS with added salt while further addition of salt reduces its propensity to LLPS. On the other hand pY-Caprin1 exhibits a monotonic trend where the propensity to phase separate decreases with the addition of salt. This distinction is captured by a two component model and also when salt ions are explicitly modeled as a separate species with a ternary phase diagram. The predicted ternary diagrams (when co and counter ions are explicitly accounted for) also predict the tendency of ions to co-condense or exclude proteins in the dense phase. Predicted trends are generally in line with the measurement for Cparin1 [sic]. Next, the authors seek to explain the observed difference in phase separation when Arginines are replaced by Lysines creating different variants. In the current rG-RPA type models both Arginine (R) and Lysine (K) are treated equally since non-electrostatic effects are only modeled in a meanfield manner that can be fitted but not predicted. For this reason, coarse grain MD simulation is suitable. Moreover, MD simulation affords structural features of the condensates. They used a force field that is capable of discriminating R and K. The MD predicted degrees of LLPS of these variants again is consistent with the measurement. One additional insight emerges from MD simulations that a negative ion can form a bridge between two positively charged residues on the chain. These insights are not possible to derive from rG-RPA. Both rG-RPA and MD simulation become cumbersome when considering multiple types of ions such as Na, Cl, [ATP] and [ATP-Mg] all present at the same time. FTS is well suited to handle this complexity. FTS also provides insights into the co-localization of ions and proteins that is consistent with NMR. By using different combinations of ions they confirm the robustness of the prediction that Caprin1 shows salt-dependent reentrant behavior, adding further support that the differential behavior of Caprin1, and pY-Caprin1 is likely to be mediated by charge-charge interactions.

      We are encouraged by this reviewer’s positive assessment of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Analysis:

      Analyze the simulation results to distinguish net charge neutralization and interchain bridging; see MacAinsh et al.

      Please see response above to points (3) and (4) under “Weaknesses” in this reviewer’s public review. We have now added a 1.5-page subsection starting from the bottom of p. 12 to the top of p. 14 to discuss a new extensive analysis of Arg<sup>+</sup>–Cl<sup>−</sup>–Arg<sup>+</sup> configurations to identify bridging interactions, with key results reported in a new Fig. 6 (p. 42). Recent results from MacAinsh, Dey & Zhou (cited now as ref. 72) are included in the added discussion. Relevant advances made in MacAinsh et al., including clarification and classification of salt-mediated interactions in the phase separation of A1-LCD are now mentioned multiple times in the revised manuscript (p. 5, lines 19–20; p. 6, lines 2–5; p. 11, line 30; p. 14, line 10; p. 18, lines 28–29; and p. 20, line 4).

      Writing and presentation

      (1) Cite subtle effects that may be missed by the coarser approaches in this study

      Please see response above to point (1) under “Weaknesses” in this reviewer’s public review.

      (2) Try to distill the findings into a simple set of conclusions

      Please see response above to point (2) under “Weaknesses” in this reviewer’s public review.

      (3) Clarify and simplify physical interpretations

      Please see response above to point (2) under “Weaknesses” in this reviewer’s public review.

      (4) Explain the treatment of ATP-Mg as either a single ion or two separate ions; reconsider modifying the reference to ATP in the title

      Please see response above to point (4) under “Weaknesses” in this reviewer’s public review.

      (5) Minor points:

      p. 4, citation of ref 56: this work shows ATP is a driver of LLPS, not merely a regulator (promotor or suppressor)

      This citation to original ref. 56 (now ref. 63) on p. 4 is now corrected (bottom line of p. 4).

      p. 7 and throughout: “using bulk [Caprin1]” – I assume this is the initial overall Caprin1 concentration. It would avoid confusion to state such concentrations as “initial” or “initial overall”

      We have now added “initial overall concentration” in parentheses on p. 8 (line 4) to clarify the meaning of “bulk concentration”.

      p. 7 and throughout: both mM (also uM) and mg/ml have been used as units of protein concentration and that can cause confusion. Indeed, the authors seem to have confused themselves on p. 9, where 400 (750) mM is probably 400 (750) mg/ml. The same with the use of mM and M for salt concentrations (400 mM Mg2+ but 0.1 and 1.0 M Na+)

      Concentrations are now given in both molarity and mass density in Fig. 1 (p. 37), Fig. 2 (p. 38), Fig. 4 (p. 40), and Fig. 7 (p. 43), as noted in the text on p. 8 (lines 4–5). Inconsistencies and errors in quoting concentrations are now corrected (p. 10, line 18, and p. 11, line 2).

      p. 7, “LCST-like”: isn’t this more like a case of a closed coexistence curve that contains both UCST and LCST?

      The discussion on p. 8 around this observation from Fig. 1d is now expanded, including alluding to the theoretical possibility of a closed co-existence curve mentioned by this reviewer, as follows:

      “Interestingly, the decrease in some of the condensed-phase [pY-Caprin1]s with decreasing T (orange and green symbols for ≲ 20◦C in Fig. 1d trending toward slightly lower [pY-Caprin1]) may suggest a hydrophobicity-driven lower critical solution temperature (LCST)-like reduction of LLPS propensity as temperature approaches ∼ 0◦C as in cold denaturation of globular proteins [7,23] though the hypothetical LCST is below 0◦C and therefore not experimentally accessible. If that is the case, the LLPS region would resemble those with both an UCST and a LCST [4]. As far as simple modeling is concerned, such a feature may be captured by a FH model wherein interchain contacts are favored by entropy at intermediate to low temperatures and by enthalpy at high temperatures, thus entailing a heat capacity contribution in χ(T), with [7,109,110] beyond the temperature-independent ϵ<sub>h</sub> and ϵ<sub>s</sub> used in Fig. 1c,d and Fig. 2. Alternatively, a reduction in overall condensed-phase concentration can also be caused by formation of heterogeneous locally organized structures with large voids at low temperatures even when interchain interactions are purely enthalpic (Fig. 4 of ref. [111]).”

      p. 8 “Caprin1 can undergo LLPS without the monovalent salt (Na+) ions (LLPS regions extend to [Na+] = 0 in Fig. 2e,f”: I don’t quite understand what’s going on here. Is the effect caused by a small amount of counterion (ATP-Mg) that’s calculated according to eq 1 (with z s set to 0)?

      The discussion of this result in Fig. 2e,f is now clarified as follows (p. 10, lines 8–14 in the revised manuscript):

      “The corresponding rG-RPA results (Fig. 2e–h) indicate that, in the present of divalent counterions (needed for overall electric neutrality of the Caprin1 solution), Caprin1 can undergo LLPS without the monvalent salt (Na+) ions (LLPS regions extend to [Na+] = 0 in Fig. 2e,f; i.e., ρs \= 0, ρc > 0 in Eq. (1)), because the configurational entropic cost of concentrating counterions in the Caprin1 condensed phase is lesser for divalent (zc \= 2) than for monovalent (zc \= 1) counterions as only half of the former are needed for approximate electric neutrality in the condensed phase.”

      p. 9 “Despite the tendency for polymer field theories to overestimate LLPS propensity and condensed-phase concentrations”: these limitations should be mentioned earlier, along with the very high concentrations (e.g., 1200 mg/ml) in Fig. 2

      This sentence (now on p. 11, lines 11–18) is now modified to clarify the intended meaning as suggested by this reviewer:

      “Despite the tendency for polymer field theories to overestimate LLPS propensity and condensed-phase concentrations quantitatively because they do not account for ion condensation [99]—which can be severe for small ions with more than ±1 charge valencies as in the case of condensed [Caprin1] ≳ 120 mM in Fig. 2i–l, our present rG-RPA-predicted semi-quantitative trends are consistent with experiments indicating “

      In addition, this limitation of polymer field theories is also mentioned earlier in the text on p. 6, lines 30–31.

      Reviewer #2 (Recommendations For The Authors):

      (1) he current version of the paper goes through many different methodologies, but how these methods complement or overlap in terms of their applicability to the problem at hand may not be so clear. This can be especially difficult for readers not well-versed in these methods. I suggest the authors summarize this somewhere in the paper.

      As mentioned above in response to Reviewer #1, we have now added a subsection with heading “Overview of key observations from complementary approaches” at the beginning of the “Results” section on p. 6 (lines 18–37) and the first line of p. 7 to make our paper more accessible to readers who might not be well-versed in the various theoretical and computational techniques. A few sentences to summarize our key results are added as well to the first paragraph of “Discussion” (p. 18, lines 23–26).

      (2) It wasn’t clear if the authors obtained LCST-type behavior in Figure 1d or if another phenomenon is responsible for the non-monotonic change in dense phase concentrations. At the very least, the authors should comment on the possibility of observing LCST behavior using the rG-RPA model and if modifications are needed to make the theory more appropriate for capturing LCST.

      As mentioned above in response to Reviewer #1, the discussion regarding possible LCSTtype behanvior in Fig. 1d is now expanded to include two possible physical origins: (i) hydrophobicity-like temperature-dependent effective interactions, and (ii) formation of heterogeneous, more open structures in the condensed phase at low temperatures. Three additional references [109, 110, 111] (from the Dill, Chan, and Panagiotopoulos group respectively) are now included to support the expanded discussion. Again, the modified discussion is as follows:

      “Interestingly, the decrease in some of the condensed-phase [pY-Caprin1]s with decreasing T (orange and green symbols for ≲ 20◦C in Fig. 1d trending toward slightly lower [pY-Caprin1]) may suggest a hydrophobicity-driven lower critical solution temperature (LCST)-like reduction of LLPS propensity as temperature approaches ∼ 0◦C as in cold denaturation of globular proteins [7,23] though the hypothetical LCST is below 0◦C and therefore not experimentally accessible. If that is the case, the LLPS region would resemble those with both an UCST and a LCST [4]. As far as simple modeling is concerned, such a feature may be captured by a FH model wherein interchain contacts are favored by entropy at intermediate to low temperatures and by enthalpy at high temperatures, thus entailing a heat capacity contribution in χ(T), with [7,109,110] beyond the temperature-independent ϵ<sub>h</sub> and ϵ<sub>s</sub> used in Fig. 1c,d and Fig. 2. Alternatively, a reduction in overall condensed-phase concentration can also be caused by formation of heterogeneous locally organized structures with large voids at low temperatures even when interchain interactions are purely enthalpic (Fig. 4 of ref. [111]).”

      (3) In Figures 4c and 4d, ionic density profiles could be shown as a separate zoomed-in version to make it easier to see the results.

      This is an excellent suggestion. Two such panels are now added to Fig. 4 (p. 40) as parts (g) and (h).

      Reviewer #3 (Recommendations For The Authors):

      I would suggest authors make some minor edits as noted here.

      (1) Please note down the chi values that were used when fitting experimental phase diagrams with rG-RPA theory in Figure 2a,b. At present there aren’t too many such values available in the literature and reporting these would help to get an estimate of effective chi values when electrostatics is appropriately modeled using rG-RPA.

      The χ(T) values and their enthalpic and entropic components ϵh and ϵs used to fit the experimental data in Fig. 1c,d are now stated in the caption for Fig. 1 (p. 37). Same fitted χ(T) values are used in Fig. 2 (p. 38) as it is now stated in the revised caption for Fig. 2. Please note that for clarity we have now changed the notation from ∆h and ∆s in our originally submitted manuscript to ϵh and ϵs in the revised text (p. 7, last line) as well as in the revised figure captions to conform to the notation in our previous works [18, 71].

      (2) Authors note “monovalent positive salt ions such as Na+ can be attracted, somewhat counterintuitively, into biomolecular condensates scaffolded by positively-charged polyelectrolytic IDRs in the presence of divalent counterions”. This may be due to the fact that the divalent negative counterions present in the dense phase (as seen in the ternary phase diagrams) also recruit a small amount of Na+.

      The reviewer’s comment is valid, as a physical explanation for this prediction is called for. Accordingly, the following sentence is added to p. 10, lines 27–29:

      “This phenomenon arises because the positively charge monovalent salt ions are attracted to the negatively charged divalent counterions in the protein-condensed phase.”

      (3) In the discussion where authors contrast the LLPS propensity of Caprin1 against FUS, TDP43, Brd4, etc, they correctly note majority of these other proteins have low net charge and possibly higher non-electrostatic interaction that can promote LLPS at room temperature even in the absence of salt. It is also worth noting if some of these proteins were forced to undergo LLPS with crowding which is sometimes typical. A quick literature search will make this clear.

      A careful reading of the work in question (Krainer et al., ref. 50) does not suggest that crowders were used to promote LLPS for the proteins the authors studied. Nonetheless, the reviewer’s point regarding the potential importance of crowder effects is well taken. Accordingly, crowder effects are now mentioned briefly in the Introduction (p. 4, line 13), with three additional references on the impact of crowding on LLPS added [30–32] (from the Spruijt, Mukherjee, and Rakshit groups respectively). In this connection, to provide a broader historical context to the introductory discussion of electrostatics effects in biomolecular processes in general, two additional influential reviews (from the Honig and Zhou groups respectively) are now cited as well [15, 16].

    1. eLife Assessment

      This manuscript aims to identify the pacemaker cells in the lymphatic collecting vessels - the cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the exemplary use of existing approaches (genetic deletions and cytosolic calcium detection in multiple cell types), the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. The inclusion of scRNAseq and membrane potential data enhances a tremendous study. This fundamental discovery establishes a new standard for the field of lymphatic physiology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the multiple cell types present in the wall of murine collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction.

      Strengths:

      The experiments are rigorously performed, the data justify the conclusions and the limitations of the study are appropriately discussed.

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine collecting lymphatics. Using state-of-the-art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics.

      Strengths:

      The use of targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels.

      Weaknesses:

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular, the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. These weaknesses have been resolved by revision and addition of new and novel RNAseq data, additional colocalization data and membrane potential measurements.

    4. Reviewer #3 (Public review):

      Summary:

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogentic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels.

      Strengths:

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.

      Weaknesses:

      - More quantitative measurements.<br /> - Possible mechanisms associated with the pacemaker activity.<br /> - Membrane potential measurements.

      Comments on revisions:

      The authors have answered my comments with additional experiments, data and manuscript edits.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript explores the multiple cell types present in the wall of murine-collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. 

      Strengths: 

      The experiments are rigorously performed, the data justify the conclusions, and the limitations of the study are appropriately discussed. 

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention. 

      Weaknesses: 

      My only major comment would be that the manuscript provides a lot of rich information describing the cellular components of the muscular lymphatic vessel wall and that these data are not well represented by the title. The title (while currently accurate) could be tweaked to better represent all that is in this manuscript. Maybe something like

      "Characterization/Interrogation of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions" or "Discovery/Confirmation of lymphatic muscle cells as innate pacemaker cells of lymphatic contraction through characterization of the cellular components of murine collecting lymphatic vessels". Potentially a cartoon summary figure of the components that make up the collecting lymphatic vessel wall could also be included. In my opinion, these changes will make this manuscript of more interest to a broader group of scientists. I have a few additional comments for consideration to improve the clarity and enhance the discussion of this work. 

      We agree with the reviewer that our original manuscript, and our resubmission even more so with the addition of the scRNAseq data, provides a significant amount of information regarding the composition of the lymphatic collecting vessel wall. We have changed our title to match one suggestion of the reviewer: “Characterization of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions".

      Reviewer #2 (Public Review): 

      Summary: 

      This is a well-written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine-collecting lymphatics. Using state-of-the-art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics. 

      Strengths: 

      The use of a targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels. 

      Weaknesses: 

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular, the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. 

      We understand the reviewer’s concern regarding the lack of a control for the colocalization analysis and that the colocalization analysis was limited to just one set of cell markers. We have now provided a colocalization analysis of Myh11 and PDGFRα, to serve as a co-localization negative control based on our RT-PCR and scRNASeq findings, which is incorporated into the current Supplemental figure 1. In regard to the staining pattern of other various marker combinations, the results were often quite clear with the representative images that two separate cell populations were being stained such as the case with labeling endothelial cells with CD31, macrophage labeling with the MacGreen mice, or hematopoietic cells with CD45. 

      During our lengthy rebuttal process we completed a single cell RNA sequence analysis using our isolated and cleaned mouse inguinal axillary lymphatic collecting vessels to aid in our characterization of the vessel wall and to more thoroughly answer these questions regarding colocalization in arguably a robust manner. The generation of our scRNAseq dataset, derived from isolated and cleaned mouse inguinal axillary collecting vessels from 10 mice, 5 male and 5 females, allowed us to profile over 2200 of the adventitial fibroblast like cells (AdvCs) we had identified in our original submission. Using this dataset, we were able to confirm co-expression of Cd34 and Pdgfrα in AdvCs and assess the co-expression of other genes of interest from our RT-PCR experiments and immunofluorescence experiments. This approach will also allow other lymphatic investigators to assess their genes of interest as our dataset is uploaded to the NIH Gene Omnibus and will be uploaded to the Broad Institute Single Cell Portal upon publication.

      Here we show that the vast majority of non-muscle fibroblast like cells referred to as AdvCs were double positive for both CD34 and PDGFRα. We also show that the AdvCs that express commonly used pericyte markers Pdgfrb and Cspg4 also co-expressed Pdgfrα. Critically, this data also shows that the AdvCs that express genes linked with lymphatic contractile dysfunction (Ano1, Gjc1 or connexin 45, and Cacna1c “Cav1.2”) co-express Pdgfrα and would render these genes susceptible to Cre-mediated recombination using our Pdgfrα-CreER<sup>TM</sup> model.  

      Reviewer #3 (Public Review): 

      Summary: 

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogenetic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels. 

      Strengths: 

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.   

      Weaknesses: 

      -  More quantitative measurements. 

      -  Possible mechanisms associated with the pacemaker activity. 

      -  Membrane potential measurements. 

      We thank the reviewers for their concerns and have addressed them in the following manner. 

      - We added novel single cell RNA sequencing of isolated and cleaned inguinal axillary vessels from 10 mice (5 males and 5 females). This allowed us to quantify the number of AdvCs that coexpress CD34 and Pdgfrα as well as the number of cells co-expressing Pdgfrα and other markers.

      - We have added a negative control with quantification for the co-localization analysis assessing Myh11 and Pdgfrα. We have added a negative control with quantification for the ChR2-photo stimulated contraction experiments using Myh11CreERT2-ChR2 mice that were not injected with tamoxifen. 

      - We also used Biocytin-AF488 in our intracellular Vm electrodes to map the specific cells in which we recorded action potentials and in neighboring cells since Biocytin-AF488 is under 1KDa and can pass through gap junctions. This approach independently labeled lymphatic muscle cells and their direct neighbors for 3 IALVs from 3 separate mice. 

      - We performed membrane potential recordings in isolated, pressurized (under isobaric conditions), and spontaneously contracting inguinal axillary lymphatic collecting vessels at different pressures. 

      - We also show that the pressure-frequency relationship is dependent on the slope of the diastolic depolarization as no other parameter was significantly altered in our study and the diastolic depolarization slope was highly correlated with contraction frequency. 

      We believe the addition of these novel data, controls, experiments, and quantifications have improved the manuscript in line with the reviewers’ suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Lines 149-162: The authors rule out the methylene blue staining cells in the cLV wall as pacemakers because they don't form continuous longitudinal connections to drive propagation. Is it possible for a pacemaker cell to only initiate the contraction and then have the LMCs make the axial electrical connections to propagate the electrical wave? I am not trying to suggest the methylene blue cells are pacemakers, but I am not sure the lack of longitudinal (or radial) connectivity is sufficient evidence to rule out the possibility. This comment also is relevant to the 3 criteria for a pacemaker cell listed in the Discussion (Lines 413-417). 

      We agree with the reviewer’s broader point that a pacemaker cell may not require direct contact with other ‘pacemaker’ cells within the tissue as long as they are still within the same electrical syncytium. However, we do expect a continuous presence of a pacemaker cell type throughout the vessel wall length to account for the persistence of spontaneous contractile behavior despite vessel length, and the ability for contraction initiation to shift (Akl et al 2011, Castorena et al 2018 and Castorena et al 2022) and the occurrence of spontaneous action potentials. In Dirk van Helden’s seminal work in 1993 on lymphatic pacemaking, a major finding was that “SM of small lymphangions or that of short segments, cut from lymphangions of any length, behaved similarly”. We have adjusted our phrase regarding the requirement of a contiguous network and instead suggest a continuous presence along the vessel network and integrated into the electrical syncytium. 

      Methylene blue is an alkaline stain that will stain acidic structures and historically methylene blue is noted to stain Interstitial cells of Cajal in the gastrointestinal tract which typically exist as network of cells(Huizinga et al 1993 and Berezin 1988). No such network was readily apparent in our methylene blue staining nor did the stained cells have a similar morphology to the ICCs of the gastrointestinal tract. Further, methylene blue is staining is not limited to ICCs or pacemaker cells at large as it has been used to kill cancer cells. Within the small intestine methylene blue was noted to also stain macrophage like cells (Mikkelsen et al 1988), and we too draw parallels between the macrophage morphology observed with Macgreen mice and methylene-blue stained cells. The specific structure for the ICC affinity for methylene blue is not well described and while the innate cytotoxicity of methylene blue and light has been used to kill ICCs and impair slow wave generation, the lack of specificity of this method leaves much to be desired. What is clear is that the ICC network highlighted by methylene blue in the gut is absent in lymphatic collecting vessels.

      In Figure 15/Video12, is it possible that the cells that are showing intracellular Ca2+ in diastole are the cells that reach a threshold membrane potential that then trigger the rest of the LMCs? As the authors have shown heterogeneity in the LMCs surface markers, is it possible that the cells with Ca2+ activity during diastole are identifiable by a distinct molecular phenotype? Or is the thought that these cells are randomly active in diastole? Some discussion/speculation about this seems appropriate. 

      We are in agreement with the reviewer’s conclusion that there is heterogeneity in the LMCs as it pertains to the calcium oscillations in diastole, either under normal buffer conditions or when L-type channels are inhibited with nifedipine. We also note significant heterogeneity in the gene expression noted within the four LMC subclusters (0-3), though we did not see significant differences in either in Ip3R1 or Ano1 expression. However, subcluster “0” had increased expression of Itprid2, also known as KRas-induced actin-interacting protein (KRAP) which is thought to tether, and thus immobilize, IP3 receptors to the actin cortex beneath the cell membrane. KRAP has been recently proposed to be a critical player in IP3 receptor “licensing” which allows IP3 receptors to release calcium (Vorontsova et al., 2022).  However, whether similar requirement of IP3R licensing is necessitated in all cells or specifically in LMCs is unknown it is quite clear there are specific release sites within the cell and this topic is currently under further investigation for a separate manuscript. We would like to note that there is yet to be a clear consensus on whether IP3R licensing is required as much of these studies are performed in cultured cells and this mechanism has only recently been described. 

      Healthy lymphatic collecting vessels typically have a single pacemaker driving a coordinated propagated contraction in ex vivo isobaric myograph studies (Castorena-Gonzalez et al., 2018), which is typically at either end of the cannulated vessel. We believe that this is due to the lack of a bordering cell in one direction and allows charge to accumulate and voltage to reach threshold at these sites preferentially. We have tried to image calcium at the pacemaking pole of the vessel to observe the specific Ca<sup>2+</sup> transients at these sites though invariably the act of imaging GCaMP6f results in the pacemaker activity initiating from the other pole of the vessel. It is our opinion that the fact that LMCs are heterogenous in their Ca<sup>2+</sup> transients is a feature to the system as it permits a wider range of depolarization signals, and thus allows finer control of the pacing as different physical/pressure or signaling stimuli is encountered. Likely, the cells with the higher propensity of Ca<sup>2+</sup> transients act as the contraction initiation site in vivo, though it must also be noted that the LMC density decreases around lymphatic valve sites. In fact, in guinea pig collecting vessels there are very few LMCs at the valves which can render them electrically uncoupled or poorly coupled (Van Helden, 1993). Thus, valve sites in which there is greater electrical resistance due to lower LMC-LMC coupling may allow for charge accumulation in the LMCs at the valve site, similar to the artificial condition achieved in our myograph preparations with two cut ends, and allow them to reach threshold first and drive coordination at the valve sties.

      An additional description of what the PTCL analysis is meant to represent physiologically would be helpful for readers. 

      We have better described the conversion of the calcium signals into “particles” for analysis at first mention in the methods and results section and have included the requisite reference to this specific methodology in Line 429-30. 

      A description of how DMAX is experimentally determined is needed. 

      We have adjusted our methods section to describe DMAX in line 774-775.

      “with Ca<sup>2+</sup>-free Krebs buffer (3mM EGTA) and diameter at each pressure recorded under passive conditions (DMAX).”

      I think the vessels referred to as popliteal lymphatic vessels are actually saphenous lymphatic vessels (afferent to the popliteal lymph node). Please clarify. 

      Indeed, some of the vessels used in this study are the afferents to the single popliteal node. They travel with the caudal branch of the saphenous vein, but have routinely been described as popliteal vessels, as opposed to saphenous lymphatic vessels, by the lymphatic field at large (Tilney 1971 PMCID: PMC1270981, Liao 2015 PMID: 25512945). To move away from this nomenclature would likely add to confusion although we agree that the lymphatic field may need to improve or correct the vessel naming paradigm to match the vascular pairs they follow.

      Reviewer #2 (Recommendations For The Authors): 

      Lines 214-215 - can you cite a reference for the observation that rhythmic contractions don't require the presence of valves? 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 224-230 - It would have been nice to see colocalization analysis for all cell types so that "negative" results could be compared with the "positives" that you report. This would help bolster evidence of your ability to separate cell types. 

      We understand the reviewer’s sentiment and agree. We have now added a “negative control” colocalization staining and analysis for PDGFR and Myh11 which has been added to the current SuppFigure 1. We stained 3 IALVs from 3 separate mice with PDGFRα and Myh11 and performed confocal microscopy. We ran the FIJI BIOP-JACOP colocalization plugin as before and observed very little colocalization of the two signals. Additionally, we have also added a coexpression assessment for CD34 and PDGFRα and other genes using our scRNAseq dataset.  

      line 293 - Should read "Cx45 in..." 

      This has been corrected. 

      “The expression of the genes critically involved in cLV function—Cav1.2, Ano1, and Cx45—in the PdgfrαCreER<sup>TM</sup>-ROSA26mTmG purified cells and scRNAseq data prompted us to generate PdgfrαCreER<sup>TM</sup>-Ano1<sup>fl/fl</sup>, PdgfrαCreER<sup>TM</sup>-Cx45<sup>fl/fl</sup>, and PdgfrαCreER<sup>TM</sup>-Cav1.2<sup>fl/fl</sup> mice for contractile tests.”

      lines 470-473 - A reference for this statement should be cited. 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 483-487 - References should be cited for these statements. 

      We have narrowed and clarified this statement and supported it with the necessary citations. 

      “Of course, mesenchymal stromal cells (Andrzejewska et al., 2019) and fibroblasts (Muhl et al., 2020; Buechler et al., 2021; Forte et al., 2022) are present, and it remains controversial to what extent telocytes are distinct from or are components/subtypes of either cell type (Clayton et al., 2022). Telocytes are not monolithic in their expression patterns, displaying both organ directed transcriptional patterns as well as intra-organ heterogeneity (Lendahl et al., 2022) as readily demonstrated by recent single cell RNA sequencing studies that provided immense detail about the subtypes and activation spectrum within these cells and their plasticity (Luo et al., 2022).”

      Lines 584-585 - Missing a reference citation. 

      Thank you for catching this error, the correct citation was for Boedtkjer et al 2013 and is now properly cited. 

      Line 638 - "these this" should read "this" 

      Thank you for catching this error. This particular sentence was removed in light of the addition of the scRNAseq data.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript from Zawieja et al. explored an interesting hypothesis about the pacemaker cells in lymphatic collecting vessels. Many aspects of lymphatic collecting vessels are still under investigation; hence this work provides timely knowledge about the lymphatic muscle cells as a pacemaker. Although it is an important topic of the investigation, the data provided do not support the overall goal of the manuscript. Many figures (Figure 1-5) provide quantitative estimation and the description provided in the results section might only be useful for a restricted audience, but not to the broader audience. Some of the figures are very condensed with multiple imaging panels and it is hard to follow the differences in qualitative analysis. Overall, this manuscript can be improved by more streamlined description/writing and figure arrangements (some of the figures/panels can be moved to the supplementary figures). 

      We disagree with the notion that the original data provided did not support the goal of the manuscript- to identify and test putative pacemaker cell types. Nonetheless we believe we have also added ample novel data to the manuscript, including membrane potential recordings and scRNAseq to highlight and to add further support to our conclusion that the pacemaker cell is an LMC. We believe the scRNAseq data will also greatly enhance the appeal of the manuscript to a broader audience and have renamed the manuscript in line with the wealth of data we have collected on the components of the vessel wall as we tested for putative pacemaker cells.

      As requested, we have moved many figures to the supplement to allow readers to focus more on the more critical experiments.

      A few other points that need to be addressed: 

      (1) Authors used immunofluorescence-based differences in various cell types in the collecting vessels. Initially, they chose ICLC, pericytes, and lymphatic muscle cells. But then they started following adventitial cells and endothelial cells. It is not clear from the description, why these other cells could be possibly involved in the pacemaker activity. It will be easier to follow if authors provide a graphical abstract or summary figure about their hypothesis and what is known from their and others' work. 

      We would like to clarify that we used the endothelial cells as controls to ensure what we observed via immunofluorescence and FACs RT-PCR were a separate cell type from either lymphatic muscle or lymphatic endothelial cells on the vessel wall. Staining for the endothelium also allowed us to assess where these PDGFRα+CD34+ cells reside in the vessel wall.  We started with a wide range of markers that are conventionally used for targeting specific cell types, but as expected those markers are not always 100% specific. Specifically, we focused on CD34, Kit, and Vimentin as those were the markers for the non-muscle cells observed in the lymphatic collecting vessel wall previously. What we found was that CD34 and PDGFRα labeled the same cell type. As there was not a CD34Cre mouse available at the time we instead utilized the inducible PDGFRαCreERTM. We are unsure how well an abstract figure will condense the conclusions from the experiments listed here but if absolutely required for publication we can attempt to highlight the representative cell populations identified on the vessel wall.

      (2) Authors used many acronyms in the manuscript without defining them (when they appeared for the first time). Please follow the convention. 

      We have checked the manuscript and made several corrections regarding the use of abbreviations.

      (3) How specific PDGFR-alpha as a marker of the pericytes? It can also label the mesenchymal cells. Why did the author choose PDGFR-alpha over beta for their Cre-based expression approach? 

      We tried to assess if there were a pericyte like cell present in or along the wall using PDGFRbeta (Pdgfrβ). Pdgfrβ is commonly used to identify pericytes (Winkler et al., 2010), while in contrast Pdgfrα is a known fibroblast marker (Lendahl et al., 2022). Pdgfrβ CreERT2 resulted in recombination in both LMCs and AdvCs, preventing it from being a discriminating marker for our study where as Myh11CreER<sup>T2</sup> and PDGFRαCreER<sup>TM</sup> were specific at least to cell type based on our FACSs-RT-PCR and staining. As you can tell from the scRNAseq data in Figure 5, there was no cell cluster that Pdgfrβ was specific for in contrast to PDGFRα and Myh11.  In Figure 6 we show the expression of another commonly used pericyte marker NG2 (Cspg4) in our scRNAseq dataset which was observed in both LMCs and AdvCs as well. Lastly, MCAM (Figure 6) can also be a marker for pericytes though we see only expression in the LMCs and LECs for this marker. Notably, almost all of the AdvCs express PDGFRα rendering the PDGFRαCreER<sup>TM</sup> a powerful tool to study this population of cells on the vessel wall including those that were PDGFRα+Cspg4+ or PDGFRα+ Pdgfrβ+.

      We were reliant on PDGFRαCreER<sup>TM</sup> as that was the only available PDGFRα Cre model at the time. Note we used PdgfrβCreER<sup>T2</sup> and Ng2Cre in our study but found that both Cre models recombined both LMCs and AdvCs.

      (4) Please include appropriate references for all the labeling markers (PDGFR-alpha, beta, and myc11 etc.) that are used in this manuscript. 

      We have added multiple references to the manuscript to support the use of these common cell “specific” markers as of course each marker is limited in some capacity to fully or specifically label a single population of cells (Muhl et al., 2020).

      (5) One of the criteria for the pacemaker cells is depolarization-induced propagated contractions. Authors have used optogenetics-induced depolarization to test this phenomenon. Please include negative controls for these experiments. 

      We have now added negative controls to this experiment which were non-induced (no tamoxifen) Myh11CreER<sup>T2</sup>-Chr2 popliteal vessels. This data has been added to the Figure 8.  

      (6) What are the resting membrane potentials of Lymphatic muscle cells? The authors should provide some details about this in the manuscript. 

      We agree with the reviewer and have added membrane potential recordings (Figure 13) at different pressures and filled our recording electrode with the cell labeling molecule BiocytinAF488 to highlight the action potential exhibiting cells, which were the LMCs. Lymphatic resting membrane potential is dynamic in pressurized vessels, which appears to be a critical difference in this approach as compared to pinned out vessels or those on wire myographs likely due to improper stretch or damage to the vessel wall. In mesenteric lymphatic vessels isolated from rats the minimum membrane potential achieved during repolarization ranges from -45 to 50mV typically while IALVs from mice are typically around -40mV, though IALVs have a notably higher contraction frequency. Critically, we have also added novel membrane potential recordings to this manuscript in IALVs at different pressures and show that the diastolic depolarization rate is the critical factor driving the pressure-dependent frequency.

      (7) In the discussion, the authors discussed SR Ca2+ cycling in Pacemaking, but the relevant data are not included in this manuscript, but a manuscript from JGP (in revision) is cross-referenced. 

      As discussed above, we have recently published our work where studied IALVs from Myh11CreERT2-Ip3R1fl/fl (Ip3r1ismKO) and Myh1CreERT2-Ip3r1fl/fl-Ip3r2fl/fl-Ip3r3fl/fl mice (Zawieja et al., 2023). Deletion of Ip3r1 from LMCs recapitulated the dramatic reduction in frequency we previously published in Myh11CreERT2-Ano1fl/fl mice and the loss of pressure dependent chronotropy. Furthermore, in this manuscript we also showed that the diastolic calcium transients are nearly completely lost in ILAVs from Myh11CreERT2-Ip3R1fl/fl knockout mice. There was no difference in the contractile function between IALVs from single Ip3r1 knockout and the triple Ip3r1-3 knockout mice suggesting that it is Ip3r1 that is required for the diastolic calcium oscillations. Further, in the presence of 1uM nifedipine there were still no calcium oscillations in the Myh11CreERT2-Ip3r1fl/fl LMCs. These findings provide further support for our interpretation that the pacemaking is of myogenic origin.

      Andrzejewska, A., B. Lukomska, and M. Janowski. 2019. Concise Review: Mesenchymal Stem Cells: From Roots to Boost. Stem Cells. 37:855-864.

      Buechler, M.B., R.N. Pradhan, A.T. Krishnamurty, C. Cox, A.K. Calviello, A.W. Wang, Y.A. Yang, L.

      Tam, R. Caothien, M. Roose-Girma, Z. Modrusan, J.R. Arron, R. Bourgon, S. Muller, and S.J. Turley. 2021. Cross-tissue organization of the fibroblast lineage. Nature. 593:575579.

      Castorena-Gonzalez, J.A., S.D. Zawieja, M. Li, R.S. Srinivasan, A.M. Simon, C. de Wit, R. de la Torre, L.A. Martinez-Lemus, G.W. Hennig, and M.J. Davis. 2018. Mechanisms of Connexin-Related Lymphedema. Circ Res. 123:964-985.

      Clayton, D.R., W.G. Ruiz, M.G. Dalghi, N. Montalbetti, M.D. Carattino, and G. Apodaca. 2022. Studies of ultrastructure, gene expression, and marker analysis reveal that mouse bladder PDGFRA(+) interstitial cells are fibroblasts. Am J Physiol Renal Physiol. 323:F299F321.

      Forte, E., M. Ramialison, H.T. Nim, M. Mara, J.Y. Li, R. Cohn, S.L. Daigle, S. Boyd, E.G. Stanley, A.G. Elefanty, J.T. Hinson, M.W. Costa, N.A. Rosenthal, and M.B. Furtado. 2022. Adult mouse fibroblasts retain organ-specific transcriptomic identity. Elife. 11.

      Gashev, A.A., M.J. Davis, and D.C. Zawieja. 2002. Inhibition of the active lymph pump by flow in rat mesenteric lymphatics and thoracic duct. J Physiol. 540:1023-1037.

      Lendahl, U., L. Muhl, and C. Betsholtz. 2022. Identification, discrimination and heterogeneity of fibroblasts. Nat Commun. 13:3409.

      Luo, H., X. Xia, L.B. Huang, H. An, M. Cao, G.D. Kim, H.N. Chen, W.H. Zhang, Y. Shu, X. Kong, Z.

      Ren, P.H. Li, Y. Liu, H. Tang, R. Sun, C. Li, B. Bai, W. Jia, Y. Liu, W. Zhang, L. Yang, Y. Peng, L. Dai, H. Hu, Y. Jiang, Y. Hu, J. Zhu, H. Jiang, Z. Li, C. Caulin, J. Park, and H. Xu. 2022. Pancancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat Commun. 13:6619.

      Muhl, L., G. Genove, S. Leptidis, J. Liu, L. He, G. Mocci, Y. Sun, S. Gustafsson, B. Buyandelger, I.V.

      Chivukula, A. Segerstolpe, E. Raschperger, E.M. Hansson, J.L.M. Bjorkegren, X.R. Peng, M. Vanlandewijck, U. Lendahl, and C. Betsholtz. 2020. Single-cell analysis uncovers fibroblast heterogeneity and criteria for fibroblast and mural cell identification and discrimination. Nat Commun. 11:3953.

      Van Helden, D.F. 1993. Pacemaker potentials in lymphatic smooth muscle of the guinea-pig mesentery. J Physiol. 471:465-479.

      Vorontsova, I., J.T. Lock, and I. Parker. 2022. KRAP is required for diffuse and punctate IP(3)mediated Ca(2+) liberation and determines the number of functional IP(3)R channels within clusters. Cell Calcium. 107:102638.

      Winkler, E.A., R.D. Bell, and B.V. Zlokovic. 2010. Pericyte-specific expression of PDGF beta receptor in mouse models with normal and deficient PDGF beta receptor signaling. Mol Neurodegener. 5:32.

      Zawieja, S.D., G.A. Pea, S.E. Broyhill, A. Patro, K.H. Bromert, M. Li, C.E. Norton, J.A. CastorenaGonzalez, E.J. Hancock, C.D. Bertram, and M.J. Davis. 2023. IP3R1 underlies diastolic ANO1 activation and pressure-dependent chronotropy in lymphatic collecting vessels. J Gen Physiol. 155.

    1. eLife Assessment

      This valuable study presents new observations on white matter organisation at the micron scale, using a combination of synchrotron imaging and diffusion MRI across two species. Notably, the authors provide solid evidence for the fasciculation of axons within major fibre bundles into laminar structures, though these structures are not consistently observed across modalities or species. The study will be of general interest to neuroanatomists and those interested in white matter imaging.

    2. Reviewer #1 (Public Review):

      This study presents valuable observations of white matter organisation from diffusion MRI and two types of synchrotron imaging in both monkeys and mice. Cross-modality comparisons are interesting as the different methods are able to probe anatomical structures at different length scales, from single axons in high-resolution synchrotron (ESRF) imaging, to clusters of axons in lower-resolution synchrotron (DEXY) data, to axon populations at the mm-scale in diffusion MRI. By acquiring all modalities in monkey and mouse ex vivo samples, the authors can observe principles of fibre organisation, and characterise how fibre characteristics, such as tortuosity and micro-dispersion, vary across select brain regions and in healthy tissue versus a demyelination model.

      One very interesting result is the observation of apparent laminar organisation of fibres in ex vivo monkey white matter samples. DESY data from the corpus callosum shows fibres with two dominant orientations (one L-R, one slightly inclined), clustered in laminar structures within this major fibre bundle. Thanks to the authors providing open data, I was able to look through the raw DESY volume and observe regions with different "textures" (different orientations) in the described laminar arrangement. That this organisation can be observed by eye, as well as by structure tensor, is fairly convincing.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors combine diffusion MRI and high-resolution x-ray synchrotron phase-contrast imaging in monkey and mouse brains to investigate the 3D organization of brain white matter across different scales and species. The work is at the forefront of the anatomical investigation of the human connectome and aligns with several current efforts to bridge the resolution gap between what we can see in vivo at the millimeter scale and the complexity of the human brain at the sub-micron scale. The authors compare the 3D white matter organization across modalities within 2 small regions in one monkey brain (body of the corpus callosum, centrum semiovale) and within one region (splenium of the corpus callosum) in healthy mice and in one murine model of focal demyelination. The study compares measures of tissue anisotropy and fiber orientations across modalities, performs a qualitative comparison of fasciculi trajectories across brain regions and tissue conditions using streamlined tractography based on the structure tensor, and attempts to quantify the shape of fasciculi trajectories by measuring the tortuosity index and the maximum deviation for each reconstructed streamline. Results show measures of anisotropy and fiber orientations largely agree across modalities, especially for larger FOV data. The high-resolution data allows us to explore the fiber trajectories in relation to tissue complexity and pathology. The authors claim the study reveals new common organization principles of white matter fibers across species and scales, for which axonal fasciculi arrange into sheet-like laminar structures.

      Strengths:

      The aim of the study is of central importance within present efforts to bridge the gap between macroscopic structures observable in vivo in humans using conventional diffusion MRI and the microscopic organization of white matter tissue. Results obtained from this type of study are important to interpret data obtained in vivo, inform the development of novel methodologies, and expand our knowledge of the structural and thus functional organization of brain circuits.

      Multi-scale data acquired across modalities within the same sample constitute extremely valuable data that is often hard to acquire and represent a precious resource for validation of both diffusion MRI tractography and microstructure methods.

      The inclusion of multi-species data adds value to the study, allowing the exploration of common organization principles across species.

      The addition of data from a murine cuprizone model of focal demyelination adds interesting opportunities to study the underlying biological changes that follow demyelination and how these impact tissue anisotropy and fiber trajectories. These data can inform the interpretation and development of diffusion MRI microstructure models.

      [Editors' note: The Reviewing Editor considers that the authors addressed the reviewers' questions adequately. The original reviews are here: https://elifesciences.org/reviewed-preprints/94917/reviews]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study presents valuable observations of white matter organisation from diffusion MRI and two types of synchrotron imaging in both monkeys and mice. Cross-modality comparisons are interesting as the different methods are able to probe anatomical structures at different length scales, from single axons in high-resolution synchrotron (ESRF) imaging, to clusters of axons in lower-resolution synchrotron (DEXY) data, to axon populations at the mm-scale in diffusion MRI. By acquiring all modalities in monkey and mouse ex vivo samples, the authors can observe principles of fibre organisation, and characterise how fibre characteristics, such as tortuosity and micro-dispersion, vary across select brain regions and in healthy tissue versus a demyelination model. The results are solid, though some statements (in the abstract/discussion) do not appear to be fully supported, and statistical tests would help confirm whether tissue characteristics are similar/different between different conditions.

      R1.1: Thank you for the kind feedback. We have included statistical tests in the paper for tissue characteristics where appropriate.

      Due to the very high number of sample points (one per voxel) within the 3D synchrotron volumes, testing for statistical significance is challenging for the structure tensor-based tissue fractional anisotropy (FA) metric. This causes any standard statistical test to have sufficient power to evaluate even minute differences between the volumes as statistically significant with high confidence. In other words, the null hypothesis (H0) will always be rejected with p = 0, regardless of the practical significance of the difference. Therefore, we have not added statistical analysis for FA results.

      For the tractography based metrics, the number of sample points (one per streamline) is not as high as that for the structure tensor FA, thus making it more reasonable to test for statistical significance. The statistical analyses performed included tests for equality of distributions (Two-sample Kolmogorov-Smirnov tests), equality of medians (Two-sided Wilcoxon rank sum tests), and equality of variance (Brown-Forsythe tests). The results are described in relation to Figure 5(B, D), Figure 8(CF), and detailed in the Methods section.

      One very interesting result is the observation of apparent laminar organisation of fibres in ex vivo monkey white matter samples. DESY data from the corpus callosum shows fibres with two dominant orientations (one L-R, one slightly inclined), clustered in laminar structures within this major fibre bundle. Thanks to the authors providing open data, I was able to look through the raw DESY volume and observe regions with different "textures" (different orientations) in the described laminar arrangement. That this organisation can be observed by eye, as well as by structure tensor, is fairly convincing. As not all readers will download the data themselves, the manuscript could benefit from additional figures/videos to demonstrate (1) the quality of the DESY data and (2) a more 3D visualisation of the laminar structures (where the coronal plane shows convincing columnar structure or stripes). Similarly in Figure 5A, though this nicely depicts two populations with different orientations, it is somewhat difficult to see the laminar structure in the current image.

      ESRF data of the centrum semiovale (CS) contributes evidence for similar laminar structures in a crossing fibre region, where primarily AP fibres are shown to cluster in 3 laminar structures. As above, further visualisations of the ESRF volume in the CS (as shown in Figure 4E) would be of value (e.g. showing consistency across the 4 volumes, 2D images showing stripey/columnar patterns along different axes, etc).

      R1.2: Conveying complex 3D geometry through 2D still images is indeed challenging, and we greatly appreciate the reviewer’s comments and suggestions. To better communicate the understanding of the 3D anatomical environments, we have taken the following actions:

      (1) To enhance insights into the tractography results in Figures 5A and 5D, we have rendered and added animations of the tractography scenes as supplemental material.

      (2) To visually support 3D insights concerning the consistency of the laminar organisation of the callosal fibres, we have replaced the 2D slice views in Figures 3A and 3B with 3D renderings similar to the one in Figure 4E.

      (3) An animation of Figure 4E was created to display the colour-coded structure tensor directions of all four stacked scans. This animation visually supports the complexity of the fibre orientation and the layered structural laminar organisation of the CS sample.

      A key limitation of this result is that, though the DESY data from the CC seems convincing, the same structures were not observed in high-resolution synchrotron (ESRF) data of the same tissue sample in the corpus callosum. This seems surprising and the manuscript does not provide a convincing explanation for this inconsistency. The authors argue that this is due to the limited FOV of the ESRF data (~200x200x800 microns). However, the observed laminar structures in DESY are ~40 microns thick, and ERSF data from the CST suggests laminar thicknesses in the range of 5-40 microns with a similar FOV. This suggests the ERSF FOV would be sufficient to capture at least a partial description of the laminar organisation. Further, the DESY data from the CC shows columnar variations along the LR axis, which we might expect to be observed along the long axis of the ESFR volume of the same sample. Additional analyses or explanations to reconcile these apparently conflicting observations would be of value. For example, the authors could consider down-sampling the ESRF data in an appropriate manner to make it more similar to the DESY data, and running the same analysis, to see if the observed differences are related to resolution (i.e. the thinner laminar structures cluster in ways that they look like a thicker laminar structure at lower resolution), or crop the DESY data to the size of the ESRF volume, to test whether the observed differences can be explained by differences in FOV. Laminar structures were not observed in mouse data, though it is unclear if this is due to anatomical differences or somewhat related to differences in data quality across species.

      R1.3: We have clarified and expanded upon the results regarding the laminar organisation observed in the monkey CC DESY data. As noted in R1.2, we replaced the 2D images in Figures 3A (DESY) and 3B (ESRF) with 3D renderings to better display the spatial outline of the laminar organisation in the volumes. The reviewer is correct that, although the smaller field of view (FOV) of the ESRF data should allow us to at least partially capture parts of the laminar organisation observed in the larger FOV of the DESY data, this is not guaranteed. It depends on how the smaller FOV is positioned relative to the structural organisation, and since we lack co-registration, we do not know this. It should now be visually evident that the ESRF FOV can be placed such that it does not cover the observed laminae, a point which is now also emphasised in the Discussion. 

      Secondly, it is important to emphasise that the voxel colouring using the primary structure tensor direction is just a visualisation technique, which has limitations when it comes to assessing laminar organisation. Mapping 3D directions to RGB colours is inherently difficult and will always have ambiguities. If we had used the standard R-G-B to LR-AP-IS colouring in Figure 3, the laminar organisation would not be evident. Additionally, the laminae will only be visible when there are clear angular differences. There can still be a layered organisation even if we don’t observe it, which is the case for the mouse. The primary direction differences of these layers could be very low (i.e., parallel layers), and consequently not visually evident. This point has been clarified in both the Results and Discussion sections.

      Finally, in response to R1.6, we have added analyses regarding the shape of the FOD, specifically estimating the Orientation Dispersion Index (ODI) and Dispersion Anisotropy (DA). This provides further context to the reviewer’s comments about the discrepancies in laminar organisation. We have reflected on the relationship between DA and the visually observed laminar organisation, and this has been integrated into the relevant parts of the Results and Discussion sections.

      The changes to manuscript reflecting the statements above are listed here: 

      The Discussion section (page 21): “In the monkey CC DESY data, which has a field of view (FOV) comparable to a dMRI voxel, a columnar laminar organisation at a macroscopic level was visually revealed from the structure tensor (ST) direction colouring. However, this laminar organisation was not visible in the higher-resolution ESRF data for the same tissue sample. Although the two samples were not co-registered, the size of a single ESRF FOV within the DESY sample is illustrated in Fig. 3A. This demonstrates the possibility of placing the ESRF sample where the observed laminar structure is absent. Consequently, knowledge of the tissue structural organisation and its orientation is important to fully benefit from the stacked FOV of the ESRF sample and when choosing appropriate minimal FOV sizes in future experiments.

      Interestingly, when characterising FODs with measures like ODI and DA as indicators of fibre organisation, rather than relying on visualisation, results from large- and small-FOV data show no discrepancies. This statistical approach discards the spatial context (visually perceived as laminae), highlighting the need to combine both methods.” 

      The Results section (page 8): “The mid-level DA values suggest some anisotropic spread of the directions, reflecting the angled laminar organisation observed in the DESY sample. Interestingly, the DA value for the ESRF sample is almost identical, despite the laminar bands being less visually apparent.”

      The Results section (page 17): “Nevertheless, visualisation of orientations did not reveal any axonal organisation in the mouse CC due to the lack of local angular contrast, unlike the clear laminar structures seen in the monkey sample (Fig. 3A). Any parallel organisation in tissue remains undetectable because our visual contrast relies on angular differences.”

      The Discussion section (page 22): “In the monkey CC (mid-body), we observed laminar organisation indicated by clear spatial angular differences in the ST directions in the sample (Fig. 3A). Quantifications of the FOD shape showed DA indices of 0.55 and 0.59 for the DESY and ESRF samples, respectively. In contrast, the mouse CC (splenium) did not visually reveal a similar angled laminar organisation (Fig. 7C), and the DA indices were lower, at 0.49 and 0.32, respectively. Two possible explanations exist. First, the within-pathway laminar organisation may not be identical across the entire CC. Consequently, more scans from other CC regions would be required to confirm. Second, the different species might account for the differences. Larger brains like the monkey might foster a different level of within-pathway axon organisation compared to the smaller mouse. Although we could not visually detect laminar organisation from the colour coding of the ST direction in the mouse, the non-zero DA values suggest some level of organisation. This is supported by our streamline tractography, which indicates a vertical layered organisation (Fig. 8A, B). It further aligns with studies using histological tracer mapping that shows a stacked parallel organisation of callosal projections in mice, between cortex regions M1 and S1 (Zhou et al. 2013). Nevertheless, we cannot rely solely on voxel-wise ST directions to fully describe axonal organisation, as this method does not contrast almost parallel fasciculi (inclination angles approaching 0 degrees). Analysing patterns in tractography streamlines would be an interesting future direction for this purpose.”

      The authors further quantify various other characteristics of the white matter, such as micro-dispersion, tortuosity, and maximum displacement. Notably, the microscopic FA calculated via structure tensor is fairly consistent across regions, though not modalities. When fibre orientations are combined across the sample, they are shown to produce similar FODs to dMRI acquired in the same tissue, which is reassuring. As noted in the text, the estimates of tortuosity and max displacement are dependent on the FOV over which they are calculated. Calculating these metrics over the same FOV, or making them otherwise invariant to FOV, could facilitate more meaningful comparisons across samples and/or modalities.

      R1.4: This raises an interesting point about the necessity of normalising the FOV to obtain invariant, tractography-based metrics of tortuosity and maximum deviation across different samples and modalities. In general, achieving this is challenging, and in this study, it is practically not possible. Between species, we encounter significant differences in brain volume ratios, which complicates the establishment of a common reference FOV due to the distinct anatomical organisation of monkey and mouse brains (see our response to R1.8). Within species, we would encounter challenges due to missing contrast—such as issues with staining—and the lack of perfect co-registration.

      The Discussion section (page 28) has been extended to reflect this: ”Within the same species, assuming perfect co-registration of samples, it would be possible to perform correlative imaging and analysis. This would allow validation of whether tractography streamlines could be reproduced at different image resolutions within the same normalised FOV. Although this was not possible with the current data and experimental setup, it would be an interesting point to pursue in future work.”

      Though the results seem solid, some statements, particularly in the abstract and discussion, do not seem to be fully supported by the data. For example, the abstract states "Our findings revealed common principles of fibre organisation in the two species; small axonal fasciculi and major bundles formed laminar structures with varying angles, according to the characteristics of major pathways.", though the results show "no strong indication within the mouse CC of the axonal laminar organisation observed in the monkey". Similarly, the introduction states: "By these means, we demonstrated a new organisational principle of white matter that persists across anatomical length scales and species, which governs the arrangement of axons and axonal fasciculi into sheet-like laminar structures." Further comments on the text are provided below.

      R1.5: We understand that it can be misunderstood that the laminar organisation is identical in monkeys and mice, which is not the case. For example, we show that in the corpus callosum, pathways are parallel in the mouse but not in the monkey. We have clarified that while the principle of layered laminar organisation of pathways is shared between monkeys and mice, species-specific differences do exist.

      We have made the following clarifying changes to the manuscript:

      The Abstract (page 2): “Our findings revealed common principles of fibre organisation that apply despite the varying patterns observed across species” 

      The Introduction (page 4-5): “Through these methods, we demonstrated organisational principles of white matter that persists across anatomical length scales and species. These principles govern the organisation of axonal fasciculi into sheet-like laminar shapes (structures with a predominant planar arrangement). Interestingly, while these principles remain consistent, they result in varied structural organisations in different species.” 

      The Discussion (page 21): “despite species differences”.

      One observation not notably discussed in the paper is that the spherical histograms of Figure 3E/H appear to have an anisotropic spread of the white points about 0,0. It would be interesting if the authors could comment on whether this could be interpreted as the FOD having asymmetric dispersion and if so, whether the axis of dispersion relates to the fibre orientations of the laminar structures.

      R1.6: That is a good point, and to address it, we have fitted spherical Bingham distributions to the FODs, allowing us to quantify their shapes. From each Bingham distribution, we derived two wellknown indices from the diffusion MRI community: the Orientation Dispersion Index (ODI) and Dispersion Anisotropy (DA) index. The ODI explains the dispersion of fibres for a single bundle FOD, whereas DA expresses the shape of the FOD on the unit sphere surface, i.e., the degree of anisotropy. We have integrated the Bingham-based analysis into the Methods, Discussion, and Results sections concerning Figures 3 and 7, but not Figure 4, which contains multiple fibre bundles that we cannot separate on a voxel level. The analysis does not impact the overall message and conclusion but adds interesting context to the discussion around laminar organisation.

      A limitation of the study is that it considers only small ex vivo tissue samples from two locations in a single postmortem monkey brain and slightly larger regions of mouse brain tissue. Consequently, further evidence from additional brain regions and subjects would be required to support more generalised statements about white matter organisation across the brain.

      R1.7: Collecting more samples from various locations in the brain would provide valuable insights into the consistency of white matter organisation across anatomical length scales, as well as the structuretensor based anisotropy and tortuosity metrics. However, being awarded beamtime at two different synchrotron facilities to scan the same sample with different imaging setups is practically challenging. At the ESRF, we have gathered additional image volumes from other white matter regions of the monkey brain that support all our findings, which will be published separately. X-ray synchrotron imaging technology is advancing rapidly, with faster acquisition times enabling more image volumes to be stitched together. This extends the FOV and allows for a more robust statistical description of the anatomy. Consequently, future studies with an extended FOV and varying image resolutions could utilise a single synchrotron facility to collect additional samples, further supporting our findings.

      The Discussion section (page 27) has been extended to reflect this: “Increasing the number of samples across both species and examining laminar organisation at various length scales in more regions would strengthen our findings. However, securing beamtime at two different synchrotron facilities to scan the same sample with varying image resolutions is a limiting factor. Beamline development for multiresolution experimental setups, along with faster acquisition methods, is a rapidly advancing field. For instance, the Hierarchical Phase-Contrast Tomography (HiP-CT) imaging beamline at ID-18 at the ESRF, enables multi-resolution imaging within a single session to address this challenge, though it is currently limited to a resolution of 2.5 μm (Walsh et al. 2021).”

      Given the monkey results, the mouse study (section 2.5 onwards) lacks some motivation. In particular, it is unclear why a demyelination model was studied and if/how this would link to the laminar structure observed in the monkey data. Further, it is unclear how comparable tortuosity/max deviation values are across species, considering the differences in data quality and relative resolution, given that the presented results show these values are very modality-dependent.

      R1.8: We have clarified the motivation for including the mouse part of the study in both the Introduction and the Results sections.

      The Introduction section (page 5): “Furthermore, using a mouse model of focal demyelination induced by cuprizone (CPZ) treatment, we investigate the inflammation-related influence on axonal organisation. This is achieved through the same structure tensor-derived micro-anisotropy and tractography streamline metrics.”

      The Results section (page 15): “Finally, we investigated the organisation of fasciculi in both healthy mouse brains and a murine model of focal demyelination induced by five weeks of cuprizone (CPZ) treatment. This allowed for the exploration of the disease-related influence on axonal organisation, particularly under inflammation-like conditions with high glial cell density at the demyelination site (He et al. 2021). The experimental setup for DESY and ESRF is similar to that described for the monkey, with the exception that we did not perform dMRI and synchrotron imaging on the same brains, and only collected MRI data for healthy mouse brains. This approach allowed us to apply the same structure tensor and tractography streamline analysis used previously, but in a healthy versus disease comparison, demonstrating the methodology’s ability to provide insights into pathological conditions.”

      Across species, the comparison of tortuosity and maximum deviation must be approached with caution. On one hand, we observe a comparable influence of the extra-axonal environment in both the monkey and mice, as discussed in the section “Sources to the non-straight trajectories of axon fasciculi.” On the other hand, the anatomical scale and relative image resolution are significant factors, as correctly pointed out. In the mouse, for instance, the measures are influenced by white matter pathway macroscopic effects, making cross-species comparison challenging to perform in a normalised way.

      The limitations section of the Discussion (page 28) has been updated to reflect this: ”A limiting consequence of having samples imaged at differing anatomical scales is that certain measures become inherently hard to compare in a normalised way. The tractography-based metrics—tortuosity and maximum deviation—serve as good examples of this resolution and FOV dependence. In the ESRF samples, the anatomical scale was at the level of individual axons, and the streamline metrics primarily reflect micro-scale effects from the extra-axonal environment, such as the influence of cells and blood vessels. In comparison, the larger anatomical scale in the DESY samples represents the level of fasciculi and above, with metrics influenced by macroscopic effects, such as the bending of the CC pathway. Both scales are interesting and can provide valuable insights in their own right, but caution is required when comparing the numbers, especially for cross-species studies where there is a significant difference in brain volume ratios.”

      The paper introduces a new method of "scale-space" parameters for structure tensors. Since, to my understanding, this is the first description of the method, some simple validation of the method would be welcomed. Further, the same scale parameters are not used across monkeys and mice, with a larger kernel used in mice (Table 2) which is surprising given their smaller brain size. Some explanation would be helpful.

      R1.9: We have expanded the description of the scale-space structure tensor approach in the Methods section. Specifically, we have elaborated on the empirical process used to select the scale-space parameters shown in Table 2 and explained why multiple scales were applied only to the monkey samples scanned at ESRF (see Table 2, sample IDs 2 and 3) but not to the other datasets. Additionally, we have added a supplementary figure to assist in illustrating the concept.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors combine diffusion MRI and high-resolution x-ray synchrotron phase-contrast imaging in monkey and mouse brains to investigate the 3D organization of brain white matter across different scales and species. The work is at the forefront of the anatomical investigation of the human connectome and aligns with several current efforts to bridge the resolution gap between what we can see in vivo at the millimeter scale and the complexity of the human brain at the sub-micron scale. The authors compare the 3D white matter organization across modalities within 2 small regions in one monkey brain (body of the corpus callosum, centrum semiovale) and within one region (splenium of the corpus callosum) in healthy mice and in one murine model of focal demyelination. The study compares measures of tissue anisotropy and fiber orientations across modalities, performs a qualitative comparison of fasciculi trajectories across brain regions and tissue conditions using streamlined tractography based on the structure tensor, and attempts to quantify the shape of fasciculi trajectories by measuring the tortuosity index and the maximum deviation for each reconstructed streamline. Results show measures of anisotropy and fiber orientations largely agree across modalities, especially for larger FOV data. The high-resolution data allows us to explore the fiber trajectories in relation to tissue complexity and pathology. The authors claim the study reveals new common organization principles of white matter fibers across species and scales, for which axonal fasciculi arrange into sheet-like laminar structures.

      Strengths:

      The aim of the study is of central importance within present efforts to bridge the gap between macroscopic structures observable in vivo in humans using conventional diffusion MRI and the microscopic organization of white matter tissue. Results obtained from this type of study are important to interpret data obtained in vivo, inform the development of novel methodologies, and expand our knowledge of the structural and thus functional organization of brain circuits.

      Multi-scale data acquired across modalities within the same sample constitute extremely valuable data that is often hard to acquire and represent a precious resource for validation of both diffusion MRI tractography and microstructure methods.

      The inclusion of multi-species data adds value to the study, allowing the exploration of common organization principles across species.

      The addition of data from a murine cuprizone model of focal demyelination adds interesting opportunities to study the underlying biological changes that follow demyelination and how these impact tissue anisotropy and fiber trajectories. These data can inform the interpretation and development of diffusion MRI microstructure models.

      Weaknesses:

      The main claim of a newly discovered laminar organization principle that is consistent across scales and species is not supported strongly enough by the data. The main evidence in support of the claim comes from the larger FOV data obtained from the body of the corpus callosum in the monkey brain. A laminar organization principle is partially shown in the centrum semiovale in the monkey brain and it is not shown in mice data. Additionally, the methods lack details to help the correct interpretation of these findings (e.g., how were these fasciculi defined?; how well do they represent different axonal populations?; what is the effect of blood vessels on the structure tensor reconstruction?; how was laminar separation quantified?) and the discussion does not provide a biological background for this organization. The corpus callosum sample suggests axons within a bundle of fibers are organized in a sheet-like fashion, while data from the centrum semiovale suggest fibers belonging to different fiber bundles are organized in a sheet-like arrangement. While I acknowledge the challenges in acquiring such high-resolution data, additional samples from different regions in the same animals and from different animals would help strengthen this claim.

      R2.1 

      -  how were these fasciculi defined?

      In the introduction (page 3), we have clarified our definition of an axon fasciculus: “A fasciculus is a bundle of axons that travel together over short or long distances. Its size and shape can vary depending on its internal organisation and its relationship to neighbouring fasciculi.”

      Additionally, we emphasise in the Results section (page 12) that the centroid streamlines are not guaranteed to be actual fasciculi, but rather representations of them. The paragraph now states: “To ease visualisation and quantification, we used QuickBundle clustering(Garyfallidis et al. 2012) to group neighbouring streamlines with similar trajectories into a centroid streamline. This centroid streamline serves as an approximation of the actual trajectory of a fasciculus.”

      - what is the effect of blood vessels on the structure tensor reconstruction?

      Fair point, that was not clear from our description. The clarification contains two parts. First, the estimation of the structure tensor occurs in all voxels, and in that sense, the blood vessels respond very similarly to axons. Second, when it comes to sample statistics derived from the structure tensor analysis (FA histograms and the FODs), they will have an influence, albeit a small one, given the low volume percentage of the blood vessels within the FOVs. In the monkey samples, segmenting the blood vessels was achievable with little effort, allowing us to exclude their contribution from FA statistics and FODs. To make this clear, we have added a paragraph to the Methods section (page 34) titled “Structure tensor-based quantifications,” reflecting this clarification. Additionally, we have restructured the entire structure tensor methods description (starting on page 32) as part of the reviewer comments in R1.6 and R1.9.

      - how was laminar separation quantified?

      We have added a clarification in Results section (page 7): “The laminar thickness was determined by manual measurements on laminae visually identified in the 3D volume”.

      - discussion does not provide a biological background for this organization.

      A good point. Including the biological background is relevant as it supports the laminar organisation of white matter pathways observed in our findings and those of others.

      We have added a section on this background in the Discussion (page 24): “We believe our observed topological rule of white matter laminar organisation can be explained by a biological principle known from studies of nervous tissue development. The first axons to reach their destination, guided by their growth cones, are known as “pioneering” axons. “Follower” axons use the shaft of the pioneering axon for guidance to efficiently reach the target region (Breau and Trembleau 2023). Axons can form a fasciculus by fasciculating or defasciculating along their trajectory through a zippering or unzipping mechanism, controlled by chemical, mechanical, and geometrical parameters. Zippering “glues” the axons together, while unzipping allows them to defasciculate at a low angle (Šmít et al. 2017). Although speculative, the zippering mechanism may be responsible for forming the laminar topology observed across length scales. The defasciculation effect can explain our results in the corpus callosum (CC) of monkeys, with laminar structures at low angles (~35 degrees) also observed by (Innocenti et al. 2019; Caminiti et al. 2009), as well as in other major pathways (Sarubbo et al. 2019). In contrast, a fasciculation mechanism may be observed in the mouse CC (0 degrees). If the geometrical angle between two axons is high, i.e., toward 90 degrees, the zippering mechanism will not occur, and the two axons (fasciculi) will cross (Šmít et al. 2017). This supports our and other findings that crossing fasciculi or pathways occur at high angles toward 90 degrees in the fully matured brain (Wedeen et al. 2012). Once myelination begins, the zippering mechanism is lost (Šmít et al. 2017), suggesting that laminar topology is established at the earliest stages of brain maturation.”

      - additional samples from different regions in the same animals and from different animals would help strengthen this claim

      Reviewer #1 also pointed to the inclusion of additional samples, and this is now discussed as part of the study limitations on page 27 (see also R1.7).

      The main goal of the study is to bridge the organization of white matter across anatomical length scales and species. However, given the substantial difference in FOVs between the two imaging modalities used, and the absence of intermediate-resolution data, it remains difficult to effectively understand how these results can be used to inform conventional diffusion MRI. In this sense, the introduction does not do a good enough job of building a strong motivation for the scientific questions the authors are trying to answer with these experiments and for the specific methodology used.

      R2.2: Indeed, this is an essential point now emphasised in the introduction, page 3, which now states: ”Despite the limited resolution of dMRI, the water diffusion process can reveal microstructural geometrical features, such as axons and cell bodies, though these features are compounded at the voxel level. Consequently, estimating microstructural characteristics depends on biophysical modelling assumptions, which can often be simplistic due to limited knowledge of the 3D morphology of cells and axons and their intermediate-level topological organisation within a voxel. Thus, complementary highresolution imaging techniques that directly capture axon morphology and fasciculi organisation in 3D across different length scales within an MRI voxel are essential for understanding anatomy and improving the accuracy of dMRI-based models(Alexander et al. 2019).”

      Additionally, in the introduction, page 4, we have made the following changes to strengthen the link across modalities, such that it now states: “In the x-ray synchrotron data, we applied a scale-space structure tensor analysis, which allowed for the quantification of structure tensor-derived tissue anisotropy and FOD in the same anatomical regime indirectly detected by dMRI.”

      The cuprizone data represent a unique opportunity to explore the effect of demyelination on white matter tissue. However, this specific part of the study is not well motivated in the introduction and seems to represent a missed opportunity for further exploration of the qualitative and quantitative relationship between diffusion MRI and sub-micron tissue information (although unfortunately not within the same brain sample). This is especially true considering the diffusion MRI protocol for mice would allow extrapolation of advanced measures from different tissue compartments.

      R2.3: A similar point was raised by Reviewer 1 (R1.8), and we have clarified the motivation for including the healthy mice and the demyelination samples.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Many thanks to the authors for providing open data. This was very helpful when reviewing the manuscript and is a valuable resource for the community.

      R1.10: We are happy to share our data with the community. Understanding anatomy in 3D is hard to achieve through still images and animations, so the ability to explore it on your own is quite important. The link to the data repository has been added in the Methods section in the following paragraph: “Due to the size of the data selected, processed image volumes, masks and results are available at https://zenodo.org/records/10458911. Other datasets can be shared on request.“

      One confusing element of the paper is that orientations (or axes) do not seem to be consistent across samples/modalities. For example, the green tensors in Figures 3 C and D are tilted up/down in opposite directions and the streamlines in Figure 5A seem opposite (SL) from what we would expect from Figure 2A (SR). Having consistent orientations across modalities and images would help the reader. When colouring tensors (e.g. in Figure 3), the authors could consider a 3D colour scheme (similar to that used by diffusion MRI) rather than colouring by only inclination, as this would provide useful information on whether different laminae have similar orientations, as implied by the tractography in Figure 4.

      R1.11: Thank you for spotting the suboptimal consistency between Figures 2, 3, and 5. Figure 2 has been corrected and updated. The left-right direction in the coronal views was not correctly displayed. Additionally, the glyph directions have been updated in Figures 2 and 3.

      By default, we use the “standard” RGB colour scheme used in dMRI. However, for the monkey CC— essentially Figure 3—this did not effectively illustrate our findings. We decided to use a different directional colour encoding scheme, which captures the angular deviation from the L-R axis. This was to assist in the visualisation of the inclination angle between the laminars. We have used the same colour scheme for the tensors in Figure 3 to avoid confusion.

      On a general note, the standard colour scheme has uniform “colour contrast” in all directions, but when there is only a single dominant direction in the sample, it can make sense to concentrate the colour contrast in that axis.

      Results: "even higher FA anisotropy in the micro-tensor domain of 0.997, i.e., the micro (μ)FA (20, 21)." I understand these references lead to a definition of μFA that is based on multiple diffusion tensor encodings which is quite different from that suggested by Kaden. It may be preferable to reference Kaden directly (since I understand this is the method used) to avoid confusion.

      R1.12: Correctly spotted, and we now reference the method from Kaden et al. and use the other references elsewhere when relevant.

      "and scanned the mouse brain in a whole." - typo?

      R1.13: Thank you for spotting the typo. The mouse brain was kept in the skull during MRI scanning, which has been clarified in the Methods section.

      The crossing fibre region appears to be sometimes referred to as the centrum semiovale, and other times as the CST. CS seems the better description and keeping this naming consistent would avoid confusion to the reader.

      R1.14: Well spotted, thank you. We have replaced the usage of Corticospinal Tract (CST) with centrum semiovale (CS) where relevant.

      Direct comments on the text:

      Abstract: "Individual axon fasciculi exhibited tortuous paths .... in a manner independent of fibre complexity and demyelination"

      Do statistical comparisons of the various distributions support this? The data shows somewhat increased tortuosity in the CST compared to the CC, and somewhat lower tortuosity in CPZ tissue.

      R1.15: The intention of the text was not to point to the comparison of tortuosity, but rather to highlight the maximum deviation. We observe a high probability density of maximum deviations at approximately 5-10 microns in all samples, which corresponds to the size of structures in the extraaxonal environment, such as blood vessels and cells.

      Additionally, we understand that the original statement might imply an expectation of a statistical analysis demonstrating independence, which is not the case. To clarify, we have reformulated the sentence in the Abstract (page 2) to address these points: “Fasciculi exhibited non-straight paths around obstacles like blood vessels, comparable across the samples of varying fibre complexity and demyelination.”

      Abstract: "A quantitative analysis of tissue anisotropies and fibre orientation distributions gave consistent results for different anatomical length scales and modalities, while being dependent on the field-of-view."

      To my understanding, the FODs here from different modalities are calculated over different FOVs (in monkeys at least), and FODs are only presented for a single FOV for each modality, meaning it is difficult to separate the effects of modality from FOV. The microscopic anisotropy is also noticeably different across modalities (DESY < ESRF < dMRI).

      R1.16: That is a fair point. Our statement was trying to capture too much condensed content to be correctly interpretable. We have reformulated the sentence to state: “Quantifications of fibre orientation distributions were consistent across anatomical length scales and modalities, whereas tissue anisotropy had a more complex relationship, both dependent on the field-of-view”.

      While it is true that we only present the ST-derived quantifications – FOD and FA statistics – for a single FOV per modality and sample, the results shown for the ESRF monkey samples (Figures 3 and 4) are a merge of four individually processed volumes. The quantifications of each individual subFOV have now been added as a supplementary figure (Figure S3) to highlight the consistency of the methodology and the effect of shifting the FOV position. In the case of the mouse, we have two volumes from different mice, which also display similar FOD and FA statistics.

      Abstract: "Our study emphasises the need to balance field-of-view and voxel size when characterising white matter features across anatomical length scales."

      This point does not seem very well explored in the paper, rather it is an observation of the limitations of the different imaging modalities. For example, there aren't analyses to compare metrics from highresolution data at different FOVs (i.e. by taking neighbourhoods of different sizes), nor are metrics compared from data at different resolutions and the same FOV.

      R1.17: The question is related to R1.16, R1.4, and R1.8, and we have addressed this point in our responses to those comments.

      Figure 7 - Taking into account the eigenvalues can be helpful when interpreting the secondary and tertiary eigenvectors of tensors (V2 and V3). It would be interesting to know whether the eigenvalues L2 ~= L3 are approximately equal (suggesting isotropic diffusion about V1, where the definition of V2 versus V3 isn't very meaningful), or if L2 is noticeably larger than L3 (suggesting anisotropic diffusion about V1, potentially similar to the anisotropic dispersion discussed above).

      R1.18: It would be interesting to explore the eigenvalues of the structure tensor in more detail, as has been done for the diffusion tensor. However, we believe this belongs to future work, as such additional detailed methodological analysis would complicate the already complex story. As mentioned in response to R1.10, most processed data has been made publicly available, and the rest can be requested (due to the storage size of the data sets) to perform such additional analysis.

      Discussion: "Importantly, our findings revealed common principles of fibre organisation in both monkeys and mice; small axonal fasciculi and major bundles formed sheet-like laminar structures," See above regarding the lack of evidence for laminar structures in mouse data.

      R1.19: We have reformulated the text for clarification as part of R1.3. Additionally, we added FOD quantifications to support why we do not observe an apparent laminar organisation in the mouse CC— please see our response to R1.6.

      Discussion: "Interestingly, the dispersion magnitude is indicative of fasciculi that skirt around obstacles in the white matter such as cells and blood vessels, and the results are largely independent of both white matter complexity (straight vs crossing fibre region) and pathology." Again, do statistical tests of the various distributions support this?

      R1.20: As part of R1.1, we have added statistical tests of significance for the quantifications of how max deviation changes when bending around objects. Indeed, the distributions are not statistically the same, and we do not wish to convey that sentiment, but they are comparable in the object sizes that they detect. As done in the abstract, we have reformulated the sentence to avoid misunderstanding and have replaced “largely independent” with “observed across.”

      Discussion: "Tax et al. have demonstrated the calculation of a sheet probability index from diffusion MRI data, which suggested the presence of sheet-like features in the CC"

      My understanding was that this was observed in crossing fibre regions, such as where fibres projecting with the CC cross the CST, but not the main body of the CC itself. Tax defines sheet structure as "composed of two tracts that cross each other on the same surface in certain regions along their trajectories." Is this a different phenomenon to the laminar structures observed here (where we observe fibres within a single tract being locally organised into laminar structures)?

      R1.21: Thank you for pointing our attention to this. We have corrected the section in the Discussion (page 23), so it now states: “Additionally, Tax et al. have demonstrated the calculation of a gridcrossing sheet probability index from diffusion MRI data, which suggested the presence of sheet-like features in a crossing fibre region (Tax et al. 2016), which is in line with our findings in the synchrotron data. Note that the method by Tax et al. only detects sheet-like structures crossing on a grid and does not reveal laminar structures with lower inclination angles, as we observed in the monkey CC.”

      Discussion: "We found that FODs were consistent across image resolutions and modalities, but only given that the FOV is the same." See above.

      R1.22: As part of our response to R1.6, we quantified the FODs using the ODI and DA indices, which should help support our statement. Nevertheless, we have toned down the statement and reformulated the text as follows: “We found that FODs were comparable across image resolutions and modalities. The observed discrepancies can be attributed to the fact that the FOVs are not exactly matched.”

      Discussion: "microscopic FA were highly correlated across modalities."

      The data shows FA is considerably lower in DESY to ESRF; within modality FA is quite consistent irrespective of tissue region; and differences between the CC and CG shown in ESRF data in mice are not repeated in DESY. It is unclear from the current data if this would lead to a high correlation across modalities. Some evidence would be helpful.

      R1.23: This is a fair point; we have not performed a correlation analysis. However, the pattern we observe for the synchrotron samples is as follows: When the anatomical length scale increases (becomes more macroscopic), the FA distribution shifts to lower values. This reflects the scale of information captured with the ST analysis (see also R1.9). Therefore, the most interesting comparison of FA statistics occurs when the resolution and anatomical length scale are approximately the same.  The sentence in question has been reformulated to the following: ”Estimates of structure tensor derived microscopic FA show a clear pattern across modalities.”

      Discussion: "If so, the (inclination angle) information might serve to form rules for low-resolution diffusion MRI based tractography about how best to project through bottleneck regions, which is currently a source of false-positives trajectories (6)."

      This is an interesting idea but it is unclear to me how this inclination information would help track through bottlenecks where, by definition, fibres are passing through with the same orientation. Some further explanation would be helpful.

      R1.24: We have elaborated on the section in the Discussion (page 23), explaining how this can be used to improve tractography tracing through complex regions: “The reason is that standard tractography methods do not "remember" or follow anatomical organisation rules as they trace through complex regions. Our findings on pathway lamination and inclination angles—low for parallel-like trajectories and high for crossing-like trajectories—can help incorporate trajectory memory into these methods, reducing the risk of false trajectories”.

      Reviewer #2 (Recommendations For The Authors):

      Below I report comments that if addressed I believe would improve the clarity and readability of the manuscript.

      -  Figures 1 and 2 would be more meaningful if combined into one figure. This would allow for a direct visual comparison of the two modalities. If space is needed, I believe the second row of Figure 1 (coronal views of CC) does not add much information. It is often hard to navigate the different orientations of the tissue in the images; thus any effort in trying to help the reader visually clarify would improve readability.

      R2.4: We considered the reviewer’s suggestion to merge Figures 2 and 3. However, this made both the figures and the main text additionally complex, so we chose to retain the original figure layout. Secondly, Figure 3 utilises a non-standard directional colormap. Keeping the colormap consistent within each figure is a feature we wish to preserve. In response to R1.11, the figures have been updated to have more consistent orientations for the monkey samples.

      In Figure 2, the second row, showing a coronal view of the CC, is essential for comparison with human data in Figure S1. It highlights where we observed the columnar laminar organisation and their inclination angle, as also detected by DTI.

      -  Figure 4 shows synchrotron data revealing an anterior-posterior component within the centrum semiovale that is not necessarily seen in the dMRI data. Could the authors comment on this?

      R2.5: Thank you for pointing this out. We have now addressed this in the Results section (page 10), where we describe the observation in detail: “Interestingly, visual inspection of the colour-coded structure tensor directions in Fig. 4E shows the existence of voxels whose primary direction is along the A-P axis. However, this represents a small enough portion of the volume that it does not appear as a distinct peak on the FOD.“

      -  The authors claim they observed several purple axons crossing orthogonally in Figure 5c. However, that is not necessarily clear in the figure.

      R2.6: We appreciate the feedback. We have now coloured the streamlines of the crossing fasciculi in Figure 5C in red.

      -  Figure 5 would benefit from adding the color encoding scheme for Figure 5d, as sometimes this is not necessarily consistent.

      R2.7: We appreciate the feedback. We have added an indication of the standard directional colour coding to Figure 5D.

      -  Figure 5d shows interesting data from the complex region. However, it is hard to visualize and it looks like there are not many streamlines traveling entirely I-S? Maybe a different orientation of the sample would help visualization.

      R2.8: A similar point was raised by Reviewer 1 (see R1.2). We have added an animation of the scene to assist in the interpretation of the 3D organisation within this complex sample.

      -  The concept of axon fasciculi is not necessarily immediately clear. Adding an explanation for what the authors refer to when using this term would improve clarity.

      R2.9: In the introduction, we now state our conceptual definition of an axon fasciculus as a number of axons that follow each other (see also R2.1).

      -  The methods do not provide details on how structure tensor FA is measured.

      R2.10: Thank you for pointing this out. We have restructured and expanded the structure tensor description in the Methods section (see also R1.9 and R2.1), which now includes the definition of FA.

      -  Why didn't the authors select the same cc region for both mice and monkeys? It seems this would have increased the strength of the comparison.

      R2.11: We agree. The reason lies in the chronology of experiments and the fact that we cannot control where demyelination takes place. We have added a clarifying description in the Methods section (page 31): “Note that several separate beamline experiments were conducted to collect the volumes listed in Table 1. In the first two experiments, samples from the monkey brain were scanned at ESRF and DESY, respectively. The samples from the mouse brain were imaged in two subsequent experiments. Consequently, the location of the identified demyelinating lesion in the cuprizone mice, which cannot be precisely controlled, did not match the location of the CC biopsies in the monkey.”

      -  While it is mentioned in the results, the methods do not explain how vessel segmentations or cell segmentation in mice was performed and for which datasets it was performed.

      R2.12: For the small ROI shown in Figure 6, the labelling was a manual process using the software ITK-SNAP, which has now been clarified in the corresponding figure caption. The generation of ROI masks and blood vessel segmentations involved a combination of intensity thresholding, morphological operations, and manual labelling in ITK-SNAP. This has been clarified in the restructured and expanded description of structure tensor analysis in the Methods section (starting on page 32).

      -  From the methods it is hard to understand (1) how many mice were used; (2) why dMRI was done on a different sample; (3) whether the same selenium region was selected for both healthy and CPZ animals; (4) how the registration across samples was performed.

      R2.13: We appreciate the feedback and have inserted clarifying statements in the relevant parts of the Methods section. (1) The total number of mice included was three: one normal, one cuprizone, and one normal for MRI scanning. (2) The quality of the collected dMRI on the mouse was too poor to use, and it could not be redone as the brain had already been sliced and prepared for synchrotron experiments. (3) The same splenium section was selected for both healthy and cuprizone mice. (4) A paragraph on image registration has been added.

      -  Diffusion MRI method sections would benefit from additional details on the protocols used.

      R2.14: Thank you for pointing this out. We have added more details about the diffusion MRI protocols, including the b-value, gradient strength, and other relevant parameters.

    1. eLife Assessment

      This valuable study extends the previous interesting work of this group to address the potentially different control of movement and posture. Through experiments in which stroke participants used a robotic manipulandum, the authors provide solid evidence supporting a lack of a relation between the resting force postural bias they measure (closely related to the flexor synergy in stroke) and kinematic deficits during movement. Based on these results, the authors propose a conceptual framework that differentially weights the two main descending pathways (corticospinal tract and reticulospinal tract) for neurologically intact and stroke patients. Discussing the potential impact of differences on muscle/spinal circuit state and their responses between holding a posture and movement, as well as the assumptions of their statistical comparisons, would further improve the paper.

    2. Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less then compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. These limitations should should be discussed.

    3. Reviewer #2 (Public review):

      Summary:

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Comments on revisions:

      The authors should be commended for being very responsive to comments and providing several further requested analyses, which have improved the paper. However, there is still some outstanding issues that make it difficult to fully support the provided interpretation.

      The authors say within the response, "We would also like to stress that these perturbations were not designed so that responses are directly compared to each other ***(though of course there is an *indirect* comparison in the sense that we show influence of biases in one type of perturbation but not the other)***." They then state in the first paragraph of the discussion that "Remarkably, these resting postural force biases did not seem to have a detectable effect upon any component of active reaching but only emerged during the control of holding still after the movement ended. The results suggest a dissociation between the control of movement and posture." The main issue here is relying on indirect comparisons (i.e., significant in one situation but not the other), instead of relying on direct comparisons. Using well-known example, just because one group / condition might display a significant linear relationship (i.e., slope_1 > 0) and another group / condition does not (slope_2 = 0), does not necessarily mean that the two groups / conditions are statistically different from one another [see Figure 1 in Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife, 8, e48175.].

      The authors have provided reasonable rationale of why they chose certain perturbation waveforms for different. Yet it still holds that these different waveforms would likely yield very different muscular responses making it difficult to interpret the results and this remains a limitation. From the paper it is unknown how these different perturbations would differentially influence a variety of classic neuromuscular responses, including short-range stiffness and stretch reflexes, which would be at play here.

      Much of the results can be interpreted when one considers classic neuromuscular physiology. In Experiment 1, differences in resting postural bias in supported versus unsupported conditions can readily be explained since there is greater muscle activity in the unsupported condition that leads to greater muscle stiffness to resist mechanical perturbations (Rack, P. M., & Westbury, D. R. (1974). The short-range stiffness of active mammalian muscle and its effect on mechanical properties. The Journal of physiology, 240(2), 331-350.). Likewise muscle stiffness would scale with changes in muscle contraction with synergies. Importantly for experiment 2, muscle stiffness is reduced during movement (Rack and Westbury, 1974) which may explain why resting postural biases do not seem to be impacting movement. Likewise, muscle spindle activity is shown to scale with extrafusal muscle fiber activity and forces acting through the tendon (Blum, K. P., Campbell, K. S., Horslen, B. C., Nardelli, P., Housley, S. N., Cope, T. C., & Ting, L. H. (2020). Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. eLife, 9, e55177.). The concern here is that the authors have not sufficiently considered muscle neurophysiology, how that might relate to their findings, and how that might impact their interpretation. Given the differences in perturbations and muscle states at different phases, the concern is that it is not possible to disentangle whether the results are due to classic neurophysiology, the hypothesis they propose, or both. Can the authors please comment.

      The authors should provide a limitations paragraph. They should address 1) how they used different perturbation force profiles, 2) the muscles were in different states which would change neuromuscular responses between trial phase / condition, 3) discuss a lack of direct statistical comparisons that support their hypothesis, and 4) provide a couple of paragraphs on classic neurophysiology, such as muscle stiffness and stretch reflexes, and how these various factors could influence the findings (i.e., whether they can disentangle whether the reported results are due to classic neurophysiology, the hypothesis they propose, or both).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less than compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. 

      Response 1.1. Thank you for your comments, which we address in detail below and in our response to Recommendations to the authors (see pp. 15-19 of this letter). We would first like to clarify the motivation behind our use of a stroke population to understand the interactions between the control of reaching in and holding. We agree that this idea can be laid out in a more compelling way.

      The fact that stroke patients usually display issues with their control of both reaching and holding, allows for within-individual comparisons of those two modes of control. Further, the magnitude of abnormalities is relatively large, making it easier to measure, compare and investigate effects. And, importantly, these two modes of control can be differentially affected after stroke (also pointed out by Reviewer 2, point 4 in Comments to the Authors). Finally, this kind of work – examining interactions between positive signs of stroke (such as abnormal posture or synergy) vs. negative signs (such as loss of motor control) – needs to be done in humans, as positive signs are relatively absent even in primates (Tower, 1940).

      We have changed our abstract (changes shown below in red), and our intro (expanding the second paragraph, lines 75-76), to lay out our motivation more clearly.

      From the abstract:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke. “

      On the other hand, the relation between force bias and the well-recognized flexor synergy seems rather self-evident, and I don't see that these results add much to that story.

      Response 1.2. While it seems natural that these biases would be the resting expression of abnormal flexor synergies (given their directionality towards the body, as shown in Figures 2-3, and the other similarities we demonstrate in Figure 8), we do not believe it is self-evident. These biases are measured at rest, with the patient passively moved and held still, whereas abnormal synergies emerge when the patient actively tries to move. The lack of relationship we find between these resting force biases and active movement underlines that the relation between force bias and flexor synergy should not be taken as self-evident, making it worthwhile to examine it (as we motivate in lines 589-596 and show in Figure 8).

      The paradox here is that, in spite of a relationship between force bias and flexor synergy (itself manifesting during attempted movement), there seems to be no relationship between force bias and direct measures of active movement (Figures 5,6). This is the paradox that inspired our conceptual model (Figure 9) and inspires to further investigate the factors under which these two systems are intermingled or kept separate. We thus find it to be a helpful element in the story.

      I am also struck by what seems to be a contradiction between the conclusions of the current and former studies: "These findings in stroke suggest that moving and holding still are functionally separable modes of control" and "the commands that hold the arm and finger at a target location depend on the mathematical integration of the commands that moved the limb to that location." The former study is mentioned here only in passing, in a single phrase in the discussion, with no consideration of the relation between the two studies. This is odd and should be addressed. 

      Response 1.3. While these two sets of findings are not contradictory, we understand how they can appear as such without providing context. We now discuss the relationship between our present study and the previous one more directly (lines 66-70 and 663-669 of the revised manuscript).

      The previous study examined how the control of movement informs the control of holding after the movement was over; the current study examines whether abnormalities in holding measured at rest with the movement leading to the rest position being passive. There are thus two important distinctions:

      First, directionality of potential effects: here we examine the effect of (abnormalities in) holding control upon movement, but the 2020 study (Albert et al., 2020) examines the effects of movement upon holding control. Stroke patient data in the 2020 study showed that, under CST damage, while the reach controller is disrupted, the hold controller can continue to integrate the malformed reach commands faithfully. In line with this, we proposed a model where the postural controller system sits downstream of the moving controller (Figure 7G in the 2020 paper). We thus did not claim, in 2020, that integration of movement commands is the only way to do determine posture control, as we stated explicitly back then, e.g. (emphasis ours):

      “Equations (1) and (2) describe how the integration of move activity may relate to changes in hold commands, but does not specify the hold command at the target.”

      In short, finding no effect of holding abnormalities upon movement (present finding) does not mean there is no potential effect of movement upon holding (2020 finding). This is something we had alluded to in the Discussion but not clarified, which we do now (see edits at the end of our response to this point).

      Second, active vs. passive movement: here, we measure holding control at rest (Experiment 1). The 2020 study shows that endpoint forces reflect the integration of learned dynamics exerted during active movement that led to the endpoint position. However, in Experiment 1, there is no active reaching to integrate, as the robot passively moves the arm to the held position. Thus, resting postural forces measured in Experiment 1 could not reflect the integration of reach commands that led to each rest position.  

      Thus, the two sets of findings are not contradictory. Taking our current and 2020 findings together suggests that active holding control would comprise would reflect both the integration of movement control that led to assuming the held position, plus the force biases measured at rest.

      Hence our decision to describe these two systems as functionally separable: while these systems can interact, the effects of post-stroke malfunctions in each can be independent depending on the function and conditions at hand. This does not make this a limited finding: being able to dissociate post-stroke impairment based on each of these two modes of control may inform rehabilitation, and also importantly, understanding the conditions in which these two modes of control become separable can substantially advance our understanding of both how different stroke signs interact with each other and how motor control is assembled in the healthy motor system. Figure 9 illustrates our conceptual model behind this and may serve as a blueprint to further dissect these circuits in the future.

      We discuss these issues briefly in lines 663-669 in our Discussion section, reproduced below for convenience:

      “It should be noted, however, that having distinct neural circuits for reaching and holding does not rule out interactions between them. For example, we recently demonstrated how arm holding control reflects the integration of motor commands driving the preceding active movement that led to the hold position, in both healthy participants and patients with hemiparesis (Albert et al., 2020). However, in that paper, we did not claim that this integration is the only source of holding control. Indeed, in Experiment 1 of the current study, we used passive movement to bring the arm to each probed position, which means that the postural biases could not be the result of integration of motor commands.” 

      And, we have adjusted our Introduction to provide pertinent context regarding our 2020 work (first paragraph, lines 66-70 of the updated manuscript).

      A minor wording concern I had is that the term "holding still" is frequently hard to parse. A couple of examples: "These findings in stroke suggest that moving and holding still are functionally separable modes of control." This example is easily read, "moving and holding [continue to be] functionally separable". Another: "...active reaching and holding still in the same workspace, " could be "...active reaching and holding [are] still in the same workspace." Simply "holding", "posture" or "posture maintenance" would all be better options.

      Response 1.4. Thank you for your suggestion. Following your comment, we have abbreviated this term to simply “holding”, both on the title and throughout the text.

      Reviewer #2 (Public Review):

      Summary: 

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Strengths: 

      The strength of the paper is that they clearly demonstrate with the posture task (i.e., active holding against a load) that the resting postural forces influence subsequent control (i.e., the path to stabilize, time to stabilize, max. deviation) following a sudden perturbation (i.e., suddenly removal of the load). Further, they can explain their findings with a conceptual model, which is depicted in Figure 9. 

      Weaknesses: 

      Current weaknesses and potential concerns relate to i) not displaying or reporting the results of healthy controls and non-paretic arm in Experiment 2 and ii) large differences in force perturbation waveforms between movement (sudden onset) and posture (sudden release), which could potentially influence the results and or interpretation. 

      Response 2.0. Thank you for your assessment, and for pointing out ways to improve our paper. We address the weakness and potential concerns in detail below.

      Larger concerns

      (1) Additional analyses to further support the interpretation. In Experiment 1 the authors present the results for the paretic arm, non-paretic arm, and controls. However, in Experiment 2 for several key analyses, they only report summary statistics for the paretic arm (Figure 5D-I; Figure 6D-E; Figure 7F). It is understood that the controls have much smaller resting postural force biases, but they are still present (Figure 3B). It would strengthen the position of the paper to show that controls and the non-paretic arm are not influenced by resting postural force biases during movement and particularly during posture, while acknowledging the caveat that the resting positional forces are smaller in these groups. It is recommended that the authors report and display the results shown in Figure 5D-I; Figure 6D-E; Figure 7F for the controls and non-paretic arm. If these results are all null, the authors could alternatively place these results in an additional supplementary. 

      Response 2.1a. Thank you for your recommendations. We agree both on the value of these analyses and the caveat associated with them: these resting postural force biases are substantially smaller for the non-paretic and control data (for example, the magnitude of resting biases in the supported condition is 2.8±0.4N for the paretic data, but only 1.8±0.4N and 1.3±0.2N for the non-paretic and control data, respectively; the difference is even greater in the unsupported condition, though this is not the one being compared to Experiment 2).

      We now conduct a comprehensive series of supplementary analyses, including the examination of non-paretic and control data for all three components of Experiment 2 (unperturbed reaches; pulse perturbations; and active holding control). These are mentioned in the Results (lines 422-424, 512513, and 574-574 of the revised manuscript) and illustrated in the supplementary materials: Supplementary Figures S5-1, S6-1, and S7-1 contain the main analyses (comparisons of instances with the most extreme resting biases for each individual) for the unperturbed reach analysis, pulse perturbation analysis, and active holding control analysis, respectively.

      We find that non-paretic and control data do not display effects of resting biases upon unperturbed reaching control (Figure S5-1) or control against a pulse perturbation early during movement (Figure S6-1) – as is the case with the paretic data. Non-paretic and control data do not display evidence of influence of their resting force biases upon active holding control either (Figure S7-1), unlike the paretic data. For the non-paretic data, however, these influences are nominally towards the same direction as in the paretic data. Given that resting biases are substantially weaker for the non-paretic case, it is possible a similar relationship exists but requires increased statistical power to discern. Moreover, it is possible that the effect of resting biases is non-linear, with small biases effectively kept under check so that their impact upon active holding control is even less than a linearly scaled version of the impact of the stronger, paretic-side biases. This can be the subject of future work.

      Please also note that, following your recommendation (Recommendations to the Authors, point 2.1), we have conducted secondary analyses which estimate sensitivity to resting bias using all datapoints, validating our main analyses; these analyses were also performed for control and non-paretic data, with similar results (Response 2.A.1).

      Further, the results could be further boosted by reporting/displaying additional analyses. In Figure 6D the authors performed a correlation analysis. Can they also display the same analysis for initial deviation and endpoint deviation for the data shown in Figure 5D-F & 5G-I, as well for 7F for the path to stabilization, time to stabilization, and max deviation? This will also create consistency in the analyses performed for each dependent variable across the paper.

      Response 2.1b. Here, we set to test whether resting biases affect movement. It is best to do this using a within-individual comparison design, rather than using across-individual correlations: while correlation analyses can in general be informative, they obscure within-individual effects which are the main comparisons of interest in our study. Consider a participant with strong resting bias towards one direction, tested on opposing perturbations; averaging these responses for each individual would mostly cancel out any effects of resting biases. Even if we were to align responses to the direction of the perturbation before averaging, the power of correlation analyses may be diluted by inter-individual differences in other factors, such as overall stiffness.

      Thus, our analysis design was instead focused on examining the differential effects of resting posture biases within each individual’s data. We compared the most extreme opposing/aligned or clockwise/counter-clockwise instances within each individual, specifically to assess these differential effects. In our revised version, we have further reinforced these analyses to include all data rather than the most extreme instances (see response 2.A.1.a to the Reviewer’s recommendation to the authors) where we performed correlations of within-individual resting posture vs. the corresponding dependent variables and compared the resulting slopes. 

      The across-individual correlation analyses add little to that for the reasons we outlined above. At the same time, it is possible they can be helpful in e.g. illustrating across-individual variability. We thus now include across-individual correlation analyses for all dependent variables, but, given their limited value, only in the supplementary material. This also means that, for consistency, we moved the correlation analysis in Figure 6 to the corresponding supplementary figure as well (Figure S6-3).

      In addition, following the Reviewer’s comment about consistency in the analyses performed for each dependent variable across the paper, we added within-individual comparisons for settling time following the pulse perturbations (Figure 6D, right).

      (2) Inconsistency in perturbations that would differentially impact muscle and limb states during movement and posture. It is well known that differences in muscle state (activation / preloaded, muscle fiber length and velocity) and limb state (position and velocity) impact sensorimotor control (Pruszynski, J. A., & Scott, S. H. (2012). Experimental brain research, 218, 341-359.). Of course, it is appreciated that it is not possible to completely control all states when comparing movement and posture (i.e., muscle and limb velocity). However, using different perturbations differentially impacts muscle and limb states. Within this paper, the authors used very different force waveforms for movement perturbations (i.e., 12 N peak, bell-shaped, 0.7ms duration -> sudden force onset to push the limb; Figure 6A) and posture perturbations (i.e., 6N, 2s ramp up -> 3s hold -> sudden force release that resulted in limb movement; Figure 4) that would differentially impact muscle (and limb) states. Preloaded muscle (as in the posture perturbation) has a very different response compared to muscle that has little preload (as in the movement perturbations, where muscles that would resist a sudden lateral perturbation would likely be less activated since they are not contributing to the forward movement). Would the results hold if the same perturbation had been used for both posture and movement (e.g., 12 N pulse for both experiments)? It is recommended that the authors comment and discuss in the paper why they chose different perturbations and how that might impact the results. 

      Response 2.2a. We agree that it can be impossible to completely control all states when comparing movement and posture. We would also like to stress that these perturbations were not designed so that responses are directly compared to each other (though of course there is an indirect comparison in the sense that we show influence of biases in one type of perturbation but not the other). Instead, Experiment 2 tried to implement a probe optimized for each motor control modality (moving vs. holding). However, the Reviewer has a point that the potential impact of differences between the perturbations is important to discuss in the paper.

      The Reviewer points out two potentially interesting differences between the two perturbations. First, the magnitude (6N for the posture perturbation vs. 12N for the pulse perturbation); second, the presence of background load in the posture perturbation, in contrast to the pulse perturbation.

      For the movement perturbation, we used a 12-N, 70ms pulse. This perturbation and scaled versions have been tested before in both control and patient populations (Smith et al., 2000; Fine and Thoroughman, 2006). For the holding perturbation, we used a background load to ensure that active holding control is engaged, and the duration of the probe (holding for about 5s) made using a stronger perturbation impractical –maintaining a background load at, say, 12N for that long could lead to increased fatigue.

      The question raised by the Reviewer, whether the findings would be the same if the same, 12-N pulse were used to probe both moving and holding control, is interesting to investigate. We would expect the same qualitative findings (i.e. there would still be a connection between resting posture and active holding control when the latter were probed with a 12N pulse). Recent work provides more specific insight into what to expect. Our posture perturbation task is similar to the Unload Task in (Lowrey et al., 2019), whereby a background torque is released, whereas our pulse perturbation is more similar to their Load Task, whereby a torque is imposed against no background load (though it is a step perturbation rather than a pulse). Lowrey et al., 2019 find that their Unload task is harder than the Load task, with 2x the fraction of patient trials classified as failed (with failure defined as task performance being outside of the 95% confidence interval for controls), though there are still clear effects for the Load task. 

      This suggests that the potential effects of using a pulse-like perturbation to probe posture control would likely be weaker in magnitude, all other things being equal. At the same time, however, the Load and Unload tasks in Lowrey et al., 2019 were perturbations of the same magnitude; it is thus also likely that the reduction in effect would be mitigated, or reversed, by the fact that we would be using a 12N instead of a 6N perturbation.

      A relevant consequence of the Lowrey et al., 2019 findings is that the Unload paradigm is superior in its ability to detect impairment in static, posture perturbations, and thus provides a better signal to detect potential relationships with resting posture biases. This is not surprising, as a background load further engages the control of active holding, which what we were trying to probe in the first place.

      But then why not use the same paradigm (preloading and release) for movement? There are two main reasons. First, requiring a background load throughout the experiment is unfeasible due to fatigue. Second, for the holding perturbation, we wanted to ensure that the postural control system is meaningfully engaged when the perturbation hits, hence we picked the background load. Were we to impose the same during moving – i.e. impose a lateral background load on the movement - we could be engaging posture control on top of movement control. This preloading would reduce the degree to which the pulse probe isolates movement control, and lead to intrusion of the posture control system in the movement task by design. This relates to what the Reviewer proposes in the comment below: preloading may result in postural biases i.e. engage posture control; see below where we argue this interpretation is within the scope of our conceptual model rather a counter to it.

      We now explain the rationale behind our perturbation design in the Methods section (lines 211-220).

      Relatedly, an alternative interpretation of the results is that preloading muscle for stroke patients, whether by supporting the weight of one's arm (experiment 1) or statically resisting a load prior to force release (experiment 2), leads to a greater postural force bias that can subsequently influence control. It is recommended that the authors comment on this. 

      Response 2.2b. We find this interpretation valid, but we do not see how it meaningfully differs from the framework we propose. We already state that the RST may be tailored for both posture/holding control and the production of large forces (which would include muscle preloading):

      “Thus, the accumulated evidence suggests that the RST could control posture and large force production in the upper limb.“ (lines 698-699 in the current version)

      “the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces” (lines 724-726 in the current version)

      And, we discuss other conditions where the RST is involved in large force production, such as power grip, and how these interact with the role of the RST in posture/holding control (lines 758-768 in the current version).

      To better explain our model, we now provide the two examples mentioned by the reviewer along with our description of the proposed role for the RST (lines 726-727):

      “…the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces (such as vertical forces for arm support, or horizontal forces for holding the arm still against a background load like in our posture/release perturbation trials).”

      We note, however, that we find resting posture abnormalities even in the presence of arm support, suggesting the involvement of the RST in holding control even when the forces involved (and the need to preload the muscle) are small.

      Reviewer #3 (Public Review): 

      The authors attempt to dissociate differences in resting vs active vs perturbed movement biases in people with motor deficits resulting from stroke. The analysis of movement utilizes techniques that are similar to previous motor control in both humans and non-human primates, to assess impairments related to sensorimotor injuries. In this regard, the authors provide additional support to the extensive literature describing movement abnormalities in patients with hemiparesis both at rest and during active movement. The authors describe their intention to separate out the contribution of holding still at a position vs active movement as a demonstration that these two aspects of motor control are controlled by two separate control regimes.

      Strengths: 

      (1) The authors utilize a device that is the same or similar to devices previously used to investigate motor control of movement in normal and impaired conditions in humans and non-human primates. This allows comparisons to existing motor control studies. 

      (2) Experiment 1 demonstrates resting flexion biases both in supported and unsupported forelimb conditions. These biases show a correlated relationship with FM-UE scores, suggesting that the degree of motor impairment and the degree of resting bias are related.

      (3) The stroke patient participant population had a wide range of both levels of impairment and time since stroke, including both sub-acute and chronic cases allowing the results to be compared across impairment levels.

      The authors describe several results from their study: 1. Postural biases were systematically toward the body (flexion) and increased with distance from the body (when the arm was more extended) and were stronger when the arm was unsupported. 2. These postural biases were correlated with FM-UE score. 3. They found no evidence of postural biases impacting movement, even when that movement was perturbed. 4. When holding a position at the end of a movement, if the position was perturbed opposite of the direction of bias, movement back to the target was improved compared to the perturbation in the direction of bias. Taken together, the authors suggest that there are at least two separate motor controls for tasks at rest versus with motion. Further, the authors propose that these results indicate that there is an imbalance between cortical control of movement (through the corticospinal tracts) and postural control (through the reticulospinal tract).

      Response 3.1. Thank you for pointing out some of the strengths of our work and summarizing our findings. A minor clarification we would like to make, related to (3), is that, while our study did enroll two patients towards the end of the subacute stage (2-3 months), the rest of the population were at the chronic stage, at one year and beyond. We thus find it very unlikely that time after stroke was the primary driver of differences in impairment in the population we studied.

      There are several weaknesses related to the interpretation of the results:

      In Experiment 1, the participants are instructed to keep their limbs in a passive position after being moved. The authors show that, in the impaired limb, these resting biases are significantly higher when the limb is unsupported and increase when the arm is moved to a more extended position.

      When supported by the air sled, the arm is in a purely passive position, not requiring the same antigravity response so will have less RST but also less CST involvement. While the unsupported task invokes more involvement of the reticulospinal tract (RST), it likely also has significantly higher CST involvement due to the increased difficulty and novelty of the task.

      If there were an imbalance in CST regulating RST as proposed by the authors, the bias should be higher in the supported condition as there should be relatively less CST activation/involvement/ modulation leading to less moderating input onto the RST and introducing postural biases. In the unsupported condition, there is likely more CST involvement, potentially leading to an increased modulatory effect on RST. If the proportion of CST involvement significantly outweighs the RST activation in the unsupported task, then it isn't obvious that there is a clear differentiation of motor control. As the degree of resting force bias and FM-UE score are correlated, an argument could be made that they are both measuring the impairment of the CST unrelated to any RST output. If it is purely the balance of CST integrity compared to RST, then the degree of bias should have been the same in both conditions. In this idea of controller vs modulator, it is unclear when this switch occurs or how to weigh individual contributions of CST vs. extrapyramidal tracts. Further, it isn't clear why less modulation on the RST would lead only to abnormal flexion.

      Response 3.2. Our model posits two mechanisms by which CST impairment would lead to increased RST involvement. The first – which is the one discussed by the Reviewer here - is a direct one, whereby weaker modulation of the RST by the CST leads to increased RST involvement. The second is an indirect one, whereby the incapacity of CST to drive sufficient motor output to deal with tasks eventually leads to increased RST drive.

      The reviewer suggests it is likely that the unsupported task demands increased activation through both the CST and the RST. If that were the case, however, it would exaggerate the effects of CST/RST imbalance after stroke compared to healthy motor control: if task conditions (lack of support) required higher CST involvement, then CST damage would have an even larger effect. In turn, this would lead to even higher RST involvement and further diminishing the ability of CST to moderate RST. Thus, RST-driven biases would be higher in the unsupported condition.

      And, given that the CST itself is damaged and has to deal with an even-increased RST activation, we would not expect that the proportion of CST involvement would outweigh RST activation, but the opposite. In fact, a series of relatively recent findings suggest just this. For example,

      • Zaaimi et al., 2012  showed that unilateral CST lesions in monkeys lead to significant increases in the excitability of the contralesional RST (Zaaimi et al., 2012). Interestingly, this effect was present in flexors but not extensors, potentially explaining why less modulation and/or overactivation of the RST would primarily lead to abnormal flexion. 

      • McPherson et al. (further discussed in point 2.A.23, by Reviewer 2 – Recommendations to the Authors) showed that, after stroke, contralesional activity (which would include the ipsilateral RST) increases relative to ipsilesional activity (which would include the contralateral CST)

      (McPherson et al., 2018). The same study also provides evidence that FM-UE may primarily reflect RST-driven impairment. The ipsilateral(RST)/contralateral(CST) balance, expressed as a laterality index, correlated with FM-UE, with lower FM-UE for indices indicating higher RST involvement. (Interestingly, the slope of this relationship was steeper when the laterality of brain activation patterns was examined under tasks with less arm support, mirroring the steeper FM-UE vs resting bias slope when arm support is absent, as shown in our Figure 8).

      • Wilkins et al., 2020 (Wilkins et al., 2020) found that providing less support (i.e. requiring increased shoulder abduction) increases ipsilateral activation (representing RST) relative to contralateral activation (representing CST).

      This resting bias could be explained by an imbalance in the activation of flexors vs extensors which follows the results that this bias is larger as the arm is extended further, and/or in a disconnect in sensory integration that is overcome during active movement. Neither would necessitate separate motor control for holding vs active movement. 

      Response 3.3. We do not think that either of these points necessarily argue against our model. First, the resting biases we observe are clearly pointed towards increased flexion, and can thus be seen as the outcome of an imbalance in the activation of flexors vs. extensors at rest. This imbalance between flexors/extensors can also be explained by the CST/RST imbalance posited by our conceptual model: in their study of CST lesions in the monkey, Zaaimi et al., 2012 found increased RST activation for flexors but not extensors, suggesting that RST over-involvement may specifically lead to flexor abnormalities (Zaaimi et al., 2012). Second, overcoming a disconnect in sensory integration may be one way the motor system switches between separate controllers; how this switch happens is not examined by our conceptual model.

      In Experiment 2, the participants are actively moving to and holding at targets for all trials while being supported by the air sled. Even with the support, the paretic participants all showed start- and endpoint force biases around the movement despite not showing systematic deviations in force direction during active movement start or stop. There could be several factors that limit systematic deviations in force direction. The most obvious is that the measured biases are significantly higher when the limb is unsupported and by testing with a supported limb the authors are artificially limiting any effect of the bias.

      Response 3.4. We do expect, in line with what the reviewer suggests, that any potential effects would be stronger in the unsupported condition. The decision to test active motor control with arm support was done as running the same Experiment 2 would pose challenges, particularly with our most impaired patients, given the duration of Experiment 2 (~2 hours, about 1 hour with each arm) and the expected fatigue that would ensue.

      However, a key characteristic of our comparisons is that we are comparing Experiment 2 active control data under arm support, against Experiment 1 resting bias data also under arm support. While Experiment 1 measured biases without arm support as well, these are not used for this comparison. And, while resting biases are weaker with arm support, they are still clear and significant; yet they do not lead to detectable changes in active movement.

      At the same time, we do not rule out that, if we were to repeat Experiment 2 without arm support, we could find some systematic deviation in the direction of resting bias in movement control. Our conceptual model, in fact, suggests that this may be the case, as we described in lines 618-620 of our original manuscript. The idea here is that, when arm support is not provided, the increased strength requirements lead to increased drive through the RST, to the point that posture control (and its abnormalities) spills into movement control (Figure 9). We now better clarify this position in our Discussion (lines 744-750):

      “The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed (i.e. resting biases may influence active reaching in absence of weight support). Supporting this idea, a study found increased ipsilateral activity (which primarily represents activation via the descending ipsilateral RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).”

      It is also possible that significant adaptation or plasticity with the CST or rubrospinal tracts could give rise to motor output that already accounts for any intrinsic resting bias.  

      Response 3.5. This kind of adaptation – regardless of the tracts potentially involved – is an issue we examined in our experiment. As we talk about in our Results (lines 458-460 in the updated manuscript), with most of our patient population in the chronic stage, it could be likely that their motor system adapted to those biases to the point that movement planning took them into account, thereby limiting their effect. This motivated us to examine responses to unpredictable perturbations during movement (Figure 6) where we still find lack of an obvious effect of resting biases upon reaching control. We thus believe that our findings are not explained by this kind of adaptation, though we agree it would be of great interest for future work to compare resting biases and reaching control in acute vs. chronic stroke populations to examine the degree to which stroke patients adapt to these biases as they recover.

      In any case, the results from the reaching phase of Experiment 2 do not definitively show that directional biases are not present during active reaching, just that the authors were unable to detect them with their design. The authors do acknowledge the limitations in this design (a 2D constrained task) in explaining motor impairment in 3D unconstrained tasks. 

      Response 3.6. It is, of course, an inherent limitation of a negative finding is that it cannot be proven. What we show here is that, there is no hint of intrusion of resting posture abnormalities upon active movement in spite of these resting posture abnormalities being substantial and clearly demonstrated even under arm support. To allow for the maximum bandwidth to detect any such effects, we specifically chose to compare the most extreme instances (resting bias-wise) for each individual, and yet we did not find any relationship between biases and active reaching.

      This suggests that, even if these biases could be in some form present during active movement, their effect would be minimal and thus limited in meaningfully explaining post-stroke impairment in active movement under arm support.

      Note that, as we already discuss, our conceptual model (Figure 9) suggests that the degree to which directional biases would be present in active reaching may be influenced by arm support (or the specific movements examined – hence our limitation in not examining 3D movement). Thus we do not claim that this independence is absolute. Examples include the last line of the passage quoted right above, and the summary statement of our Discussion quoted below (lines 639-641):

      “…which raises the possibility that the observed dissociation of movement and posture control for planar weight-supported movements may break down for unsupported 3D arm movements.”

      Finally, we now more explicitly acknowledge that abnormal resting biases may influence active movement in the absence of arm support (see Response 3.4).

      It would have been useful, in Experiment 2, to use FM-UE scores (and time from injury) as a factor to determine the relationship between movement and rest biases. Using a GLMM would have allowed a similar comparison to Experiment 1 of how impairment level is related to static perturbation responses. While not a surrogate for imaging tractography data showing a degree of CST involvement in stroke, FM-UE may serve as an appropriate proxy so that this perturbation at hold responses may be put into context relative to impairment.

      Response 3.7. Here the Reviewer suggests we use FM-UE scores as a proxy for CST integrity. We do not think this analysis would be particularly helpful in our case for a number of reasons:

      First, while FM-UE is a general measure of post-stroke impairment, it was designed to track - among other things - the emergence and resolution of abnormal synergies, a sign assumed to result from abnormally high RST outflow (McPherson et al., 2018; McPherson and Dewald, 2022). In line with this, the FM-UE scales with EMG-based measures of synergy abnormality (Bourbonnais et al., 1989). Impairments in dexterity, a sign associated with damage to the CST (Lawrence and Kuypers, 1968; Porter and Lemon, 1995; Duque et al., 2003), dissociate with synergy abnormalities when compared under arm support as we do here (Levin, 1996; Hadjiosif et al., 2022). This means that FM-UE would be a stronger proxy for RST activity and thus not a direct proxy for CST integrity particularly when one wants to dissociate RST-specific vs. CST-specific abnormalities. In fact, as we discuss in Response 3.2 above, there is a number of studies supporting this idea: for example, Zaaimi et al., 2012 show that relative RST activation – the balance between ipsilateral excitability, primarily reflecting RST, and contralateral excitability, primarily reflecting the CST, scales with FM-UE (Zaaimi et al., 2012).

      Second, this kind of analysis would obscure within-individual effects, since FM-UE scores are, of course, assigned to each individual. This is the same issue as doing across-individual correlation analyses in general (see response 2.1b).Strong resting force bias would have opposite effects on opposing perturbations, averaging across subjects would occlude these effects.

      Third, while FM-UE is a good measure of synergy abnormality, weakness alone could also give an abnormal FM-UE (Avni et al., 2024).

      The Reviewer also suggests we use time from injury for this analysis. Time from injury can indeed potentially be an important factor. However, this analysis would not be appropriate for our dataset, since the effective variation in recovery stage within our population is limited: our sample is essentially chronic (only two patients were examined within the subacute stage – at 2 and 3 months after stroke - with everybody else examined more than a year after stroke) with the “positive” elements of their phenotype (and FM-UE itself) essentially plateaued (Twitchell, 1951; Cortes et al., 2017). We thus would not expect to see any meaningful effects of time from injury within our population. It would be an excellent question for future work to investigate both resting biases and their relationship to reaching in acute/subacute patients, and examine whether the trajectory of resting biases (both emergence and abatement due to recovery) follows the one for abnormal synergies.

      It is not clear that even in the static perturbation trials that the hold (and subsequent move from perturbation) is being driven by reticulospinal projections. Given a task where ~20% of the trials are going to be perturbed, there is likely a significant amount of anticipatory or preparatory signaling from the CST. How does this balance with any proposed contribution that the RST may have with increased grip?

      Response 3.8. We included our response to this as part of Response 3.2. In brief, while we cannot rule out that these tasks may recruit increased CST signaling, this would tend to increase, rather than reduce, the effects of post-stroke impairment: the requirement for increased signaling from a CST that is damaged would magnify the effects of this damage, in turn leading to increased recruitment of other tracts, such as the RST.

      In general, the weakness of the interpretation of the results with respect to the CST/RST framework is that it is necessary to ascribe relative contributions of different tracts to different phases of movement and hold using limited or indirect measures. Barring any quantification of this data during these tasks, different investigators are likely to assess these contributions in different ways and proportions limiting the framework's utility.

      Response 3.9. We believe that our Reponses 3.2-3.6 put our findings in fair perspective, and the edits undertaken based on the Reviewer’s comments have clarified our position as to how the dissociation between holding and moving control may break down. We do agree, however, that our framework would be strengthened by the use of direct measures of CST/RST connectivity in future research. We present our conceptual model as a comprehensive explanation of our findings and how they blend with current hypotheses regarding the role of these two tracts in motor control after stroke.  As such, it provides a blueprint towards future research that more directly measures or modulates CST and RST involvement, using tools such as tractography or non-invasive brain stimulation.

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      L226 “…of this issue, we repeated the analysis of Figure 7F (a) by excluding these four patients…”.  Should this be three, based on the previous sentence? 

      Response 1.A.1. Thank you for pointing this typo, which is now corrected. The analysis in question (Figure S1 in the original submission, now re-numbered as Figure S7-4), excluded the three patients mentioned in the previous sentence.

      L254 “…the hand was held in a more distal position. The postural force biases were strongest when…”  Could this be "extended" rather than distal? See my later comment about the inadequate description of targets.

      Response 1.A.2. The reviewer is correct that, the arm will tend to be more extended in the distal targets. However, since these positions were defined in extrinsic coordinates, we think the terms distal/proximal are also appropriate. In either case, we now clarify these definitions in the text (see Response 1.A.3 below).

      L263 “…contained both distal and proximal targets, and, importantly, they were also the movement…”.  Distal/proximal targets were never described as part of the task. 

      Response 1.A.3. We improved our description by (i) changing the wording above to “represented positions both distal and proximal to the body,”, (ii) doing the same in our Methods (line 175) and (iii) indicating distal/proximal targets in Figure 3A (bottom right of panel A).

      L378 “…the pulse perturbation. We hypothesized that, should resting postural forces play a role, they…”  L379 “…would tend to reduce the effect of the pulse if they were in the opposite direction, and…”  Not really obvious why. A reduction in the displacement caused by a force pulse might be caused by different stiffness or viscosity, but not by a linear, time-invariant force bias. This situation is different from that of "moving the arm through a high-postural bias area vs. a low-postural bias area" where it would encounter time- (actually spatially) varying forces and varying amounts of displacement. Clarify the logic if this is a critical point.

      Response 1.A.4. We thank the Reviewer for highlighting this point of potential confusion. We now clarify that these postural bias forces are neuromuscular in origin (Kanade-Mehta et al., 2023), and likely result from an expression of abnormal synergy, at least under static conditions. In this case, we hypothesized that force pulses acting against the gradient of the postural bias field would act to stretch the already active muscles, which would lead to a further increase in postural resistance due to inherent length-tension properties of active muscle. By contrast, force pulses acting along the gradient of the postural bias field would act to shorten the same active muscles, which would lead to a reduction in postural resistance. The data did not support this in the case of force pulses imposed during movement. We note, however, that similar effects would affect responses to static perturbations as well, wherein we do find an effect of resting biases. We now better explain this reasoning (lines 479482).

      L466 “resting postural force). In short, our perturbations revealed that resting flexor biases switched  467 on after movement was over, providing evidence for separate control between moving” and 

      L468 “holding still.”

      I do not think the authors have presented clear evidence that forces, "switch on", implying the switch to a different controller which they posit. This could as easily be a nonlinear or time-varying property of a single controller (admittedly, the latter possibility overlaps broadly with their idea of distinct, interacting controllers). An example that the authors are certainly aware of is that of muscle "thixotropy" a purely peripheral mechanism due to the dynamics of crossbridge cycling that causes resting muscle to be stiffer than moving muscle, changing with a time constant of ~1-2 seconds. Neither this particular example nor changing levels of contraction (more likely during the unpredictable force perturbations) would be in the direction to explain the main observation here -- a point perhaps worth making, together with the stretch reflex comments. 

      Response 1.A.5. Thank you for this perspective. Indeed, it might be that “switching on” represents a shift along a nonlinear property of the same controller: in the extreme, if this nonlinearity is a step (on/off) function, this single controller would be functionally identical to two separate controllers. We thus cannot tell if these controllers are distinct in the strict sense. What we argue here is that, no matter the underlying controller architecture - two distinct controllers or two distinct modes of the same controller - is that the control of reaching vs. holding can be functionally separable even after stroke. In line with this idea, we used a more nuanced phrasing (e.g. “separable functional modes for moving vs. holding”) throughout our manuscript, and we have now edited out a mention of “separate controllers” to be consistent with this.

      Moreover, thank you for pointing out the example of thixotropy, showing how peripheral mechanisms could interact with central control. As you point out, this effect would not explain the main observation here: in fact, if stiffness were substantially higher during rest or holding (instead of moving) that would reduce the impact of the static perturbation, making it harder to detect any effects of resting biases compared to the moving perturbation case.

      L480 “…during movement (Sukal et al., 2007). Yet, Experiment 2 found no relationship between resting…” L481”… postural force biases and active movement control. To further investigate this apparent…”  The methods of the two studies seem fairly similar, but this question warrants a more careful comparison. How did the size of the two workspaces compare? What about the magnitude of the exerted forces? The movement condition in this study was done with the limb entirely supported. Under that condition, the Sukal study also found fairly small effects of the range of motion.

      Response 1.A.6. Sukal et al., 2007 did not directly measure exerted forces, but instead compared the active range of motion under different loading conditions. They used the extent of reach area to quantify the effect of abnormal synergies, with a more extended active range of motion signifying reduced effect of abnormal synergies. As the Reviewer points out, Sukal et al. found fairly small effects of synergies upon the range of motion when arm support was provided (the reach area for the paretic side was found to be about 85% of the nonparetic side under full arm support, though they were statistically significantly different, Figure 5 of their paper). They found increasing effect of synergies as arm support was reduced: on average, the reach area when participants had to fully support the arm was less than 50% the reach area when full arm support was given (comparing the 0% vs. 100% active support conditions [i.e. 100% vs. 0% external support] in their Figure 5). As we discuss in our paper, this effect of arm support upon synergy mirrors the one we found for resting postures.

      To compare our workspace with the one in Sukal et al., we overlaid our workspace (the array of positions for which the posture biases were measured, for a typical participant from Experiment 1) on the one they used as shown in their Figure 4. Note that their figure only shows an example participant, and thus our ability to compare is limited by the fact that each participant can vary widely in terms of their impairment, and assumptions had to be made to prepare this overlay (e.g. that (0,0) represents the position of the right acromion point). 

      For this example, and our assumptions, our workspace was smaller, with the main points of interest (red dots, the movement start/end points used for Experiment 2) within the Sukal et al. workspace. That our workspace is smaller is not surprising, given that the area in Sukal et al. represents the limit of what can be reached, and thus motor control *has* to be examined in a subset of that area.

      Author response image 1.

      Comparing the two study methodologies, however, suggests an advantage of measuring resting biases in terms of sensitivity and granularity: first, resting biases can be clearly detected even under arm support (something we point out in our Discussion, lines 715-717); second, they can measure abnormalities at any point in the workspace, rather than a binary within/without the reach area. The resting bias approach may thus be a more potent tool to probe the shared bias/synergy mechanisms we propose here.

      Figure 2 

      Needs color code. 

      The red dots could be bigger.

      Response 1.A.7. We have increased the size of the red dots and added a color code to explain the levels illustrated by the contours. We also expanded our caption to better explain this illustration.

      Figure 3

      Labeling is confusing. Drop the colored words (from both A and B), and stick to the color legend. Consider using open and filled symbols (and bars) to represent arm support or lack thereof. The different colored ovals are very hard to distinguish.

      Response 1.A.8. We find these recommendations improve the readability of Figure 3 and we have thus adopted them - see updated Figure 3.

      Figure 4

      Not terribly necessary.  

      Response 1.A.9. While this figure is indeed redundant based our descriptions in the text, we kept it as we believe it can be useful in clarifying the different stages of movement we examine.

      Figure 5 

      Tiny blue and green arrows are impossible to distinguish. 

      Although the general idea is clear, E and H are not terribly intuitive.  Add distance scale bars for D-I. 

      Response 1.A.10. For improved contrast, we now use red and blue (also in line with comment below regarding Figure 7), and switched to brighter colors in general. To make E and H more intuitive and easier to follow, we expanded the on-panel legend. Thank you for pointing out that distance scale bars are missing; we have now added them (panels EFHI).

      Figure 6 

      Panel E inset is too small. 

      Response 1.A.11. We have now moved the inset to the right and enlarged it.

      Figure 7 

      Green and blue colors are not good. 

      Response 1.A.12. For improved contrast, we now use red and blue.

      Figure 8 

      Delete or move to supplement? 

      Response 1.A.13. We respectfully disagree. While the relationships on these data are also captured by the ANOVA, we believe these scatter plots offer a better overview of the relationships between force biases and FM-UE across different conditions.

      Really minor

      L113 “…participants' lower arm was supported using a custom-made air-sled (Figure 1C). Above the  participant's…” 

      Response 1.A.14. We put the apostrophe after the s so to refer to participants in general (plural).

      L117 ”…subject-produced forces on the handle were recorder using a 6-axis force transducer.”  recorded 

      Response 1.A.14. Thank you for pointing out this error which we have now corrected.

      L136 “…2013), Experiment 1 assessed resting postural forces by passively moving participants to>…”  The experiment did not move the participant. 

      Response 1.A.15. We now fix this issue: “by having the robot passively move…”

      L248 “…experiment blocks: two with each arm, with or without arm weight support (provided by an air experimental…”

      Response 1.A.16. We have now corrected this.

      L364 “…responses to mid-movement perturbations. In 1/3 of randomly selected reaching movements…”  Obviously, you mean 1/3 of all movements: "One-third of the reaching movements were chosen randomly"  

      Response 1.A.17. We now clarify: “In 1/3 of reaching movements in Experiment 2, chosen randomly”. Also please note our response to Reviewer 2, point 10: we now report the exact number of trials for which each kind of perturbation was present.

      L609 “Damage to the CST after stroke reduces its moderating influence upon the RST (Figure 9,…”  "its" refers to the subject, "Damage", not "CST".

      Response 1.A.18. We have changed this to “Post-stroke damage to the CST reduces the moderating influence the CST has upon the RST”.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout, the authors cleverly selected the most opposed and most aligned resting postural force biases to perform a within-subject analysis. However, this approach excludes a lot of data. The authors could perform an additional within-subject analysis. For each participant they could correlate lateral resting posture force bias to each dependent variable, utilizing all the trials of a participant. 

      Response 2.A.1a. Thank you for your appreciating our analysis design, and suggesting additional analyses. We focused our within-subject analysis design on the most extreme instances, as we believe that this approach would offer the best opportunity to detect any potential effects of resting biases. We reasoned that, since resting biases tend to be relatively small for most locations in the workspace, taking all biases into account would inject a disproportionate amount of noise in our analysis, which would in turn diminish our ability to detect any potential relationships. This could be because small biases lead to small effects but also small biases may themselves be more likely to reflect measurement noise in the first place. Note that our study talks about separability of active reaching from resting abnormalities based on lack of relationships between the two. While one cannot definitely prove a negative, it is also important to take the approach that maximizes the ability to detect any such relationship if there were one. We believe taking the most extreme instances fulfills that role.

      However, as the Reviewer points out, this approach also excludes a substantial amount of data. We agree that our findings could be further strengthened by exploring additional within-subject analyses that utilize all trials. Thus, following the reviewer’s suggestion, we estimated the sensitivity of each dependent variable to lateral resting posture force bias. Specifically, we estimated the slope of this relationship for each individual (separately for paretic and non-paretic data) using linear regression, and assessed whether the average slope is significant for each group (paretic data, non-paretic data, and control data).

      This secondary analysis replicated our main findings: lack of relationship between posture biases and active reaching control (both for unperturbed and perturbed movement), and a significant relationship between posture biases and active holding control. In addition, in line with main point 2.1 by the reviewer, we performed the same analyses for non-paretic and control data. While there are no definitive conclusions to be made for these cases (as was likely, given that the resting force biases are smaller, as also pointed out by the Reviewer in 2.1) these data are worthy of discussion, with potentially interesting insights (for example, there are hints that the connection between resting biases and active holding control is present in the non-paretic arm as well, and may be explored in future research).

      We have included these analyses in the supplementary materials, and we point to them in the main text. Specifically:

      First, in line with our main analyses in Figure 5, we find no effect (the average slope is insignificant) for start and endpoint biases upon the corresponding reaching angles. This is now mentioned in lines 425-434 of the Results, and illustrated in Figure S5-2. There was a lack of effect for the non-paretic and control data as well.

      Second, in line with our main analyses in Figure 6, we find no effect of start biases upon responses to the pulse (Figure S6-2, mentioned in lines 513-517 of the Results). As above, there was no effect of non-paretic or control data either.

      And, finally, in line with our main analysis in Figure 7, we find an effect of resting biases upon performance for the static perturbation (Figure S7-2, mentioned in lines 578-586 of the Results). Interestingly, there is a suggestion that resting biases may affect static perturbation responses in the non-paretic data as well based on the relationship between posture bias and maximum deviation, but not the other two metrics. Given the lack of consistency of resting bias effects for all three different dependent variables examined, however, our current data are thus unable to give a definite answer as to whether there is the connection between resting biases and active holding control is also present in the non-paretic side. Our hypothesis is that, since resting abnormalities and their effects are the pathological over-manifestations of mechanisms inherent in the motor system in general, then such a relationship would exist. Answering this question, however, would require an experiment design better tailored to detect relationships in the non-paretic arm, where resting biases are weaker.

      We thank the Reviewer for their suggestions and believe that these additional analyses provide a more complete picture of the data, and their consistency with our main results reinforces the message of the paper.

      Then, they can report the percentage of participants that display significant correlations separately for the paretic, nonparetic, and control arms. 

      Response 2.A.1b. We note that, even in cases where the average slope (across individuals) is significant, the individual slopes themselves are usually not significant, likely due to the large amount of noise for datapoints corresponding to weak resting biases. To further examine this, we performed additional analyses whereby we examined slopes by (a) pooling all participant data together (centered separately for each individual), and then (b) took a further step to normalize each participant’s data not only by centering but by also adjusting by each individual’s variability along each axis (i.e. assess the slope between z-scores of resting bias vs. z-scores of each dependent variable). These two analyses confirmed our finding that resting biases interacted with active motor control, with significant slopes between resting biases and outcome variables. (a) Pooling all data together: path to stabilization: p = 0.032; time to stabilization: p = 1.4x10-5; maximum deviation: p = 0.021. (b) Pooling and normalizing: path to stabilization: p = 0.0013; time to stabilization: p = 8.6x10-6; maximum deviation: p = 0.00056. The latter analysis showed even stronger connection between resting bias and active holding control, probably due to better accounting for differences in the range of resting biases across participants). For simplicity, however, we only provide the across-individual slope comparisons in the paper.

      (2) An important aspect of all the analyses is that they rely heavily on estimates of the resting postural force bias. How stable are these resting postural force biases at the individual level? The authors could assess this by reporting within-subject variance for both the magnitude and direction of the resting postural force bias.

      Response 2.A.2. Thank you for your suggestion. We now assess the individual-level variance in error across measurements for patients’ paretic data using an ANOVA: the variance that remains after all other factors (same probe location; same arm support condition; same participant) are taken into account. We found that individual level measurement variance explained a mere 9.0% of total variance for resting bias magnitude. (We note that the same figure was 20.2% for the non-paretic data, in line with the weaker average biases which would be more susceptible to noise). We now note this in the Methods, as part of the new subsection “Stability of resting posture bias measurements in Experiment 1” (lines 266-273).

      (3) Does resting postural force bias influence hand movement immediately following force release from the postural perturbation? This could be assessed before any volitional responses by examining the velocity of the hand during the first 50 ms following the postural perturbation.

      Response 2.A.3. The influence seems fairly rapid, within the first 100ms as shown to the right. Here we plot hand deviation in the direction of the perturbation for the most-opposed (red) vs. most-aligned (blue) instances to examine when these curves become different. The bottom plots show the difference between these two, whereas shading indicates SEM (note that these curves are referenced to the average deviation in the last 0.5 s before force release). The rightmost plots zoom in to make it easier to see how responses to the most opposed vs. most aligned instances diverge.

      To detect the earliest post-perturbation timepoint for which this effect was significant, we performed paired t-tests at each timestep, and found that the two responses were systematically statistically different 95ms after perturbation onset onwards. For reference, the same method detected a response at 25ms for the most aligned instances and 40ms for the most opposed instances.

      We have now added Supplementary Figure S7-4 with short commentary in the Supplementary Materials.

      (4) Abstract. lines 7-9. At a glance (and when reading the manuscript linearly) this sentence is unclear. If the paretic arm is compromised across rest and movement, how does that afford the opportunity to address the relationship between reaching, stopping, and stabilizing when all could be impacted? It might be useful to specify that these factors may impacted differently relative to one another with stroke, providing an opportunity to better understand the differences between movement and postural control. 

      Response 2.A.4. Thank you for pointing out this issue (also related to Reviewer 1’s point – Response 1.1). We have changed this to more clearly reflect our reasoning and highlight that the issue is that stroke can differentially impact reaching vs. holding, copied below:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke.”

      (5) Line 27. It is perhaps more appropriate to say conceptual model than simply 'model'.  

      Response 2.A.5. Thank you for your suggestion, which we have adopted throughout the manuscript.

      (6) Line 122-125. Figure 1A caption. The authors should specify that resting posture force biases occur when the limb or hand is physically constrained in a specific position. 

      Response 2.A.6. Thank you for pointing this out – we have clarified the caption:

      “If one were to physically constrain the hand in a position away from the resting posture, the torques involved in each component of the abnormal resting posture translate to a force on the hand (blue arrow);”

      (7) Line 147. Why was the order not randomized or counterbalanced? 

      Response 2.A.7. We prioritized paretic data, as the primary analyses and comparisons in our paper involved resting posture biases and active movement with the paretic arm. We note that our primary analyses, which rely on paretic-paretic comparisons, would not be affected by paretic vs. non-paretic ordering effects. However, ordering effects could potentially affect comparisons between paretic and non-paretic data. We now note the reasoning behind the absence of counterbalancing, and mention the potential limitation in interpreting paretic to non-paretic comparisons in lines 124-129 of the Methods.

      (8) Line 172. 12N is the peak force of the pulse?

      Response 2.A.8. The reviewer is correct; we have clarified our description (line 463 in the updated manuscript):

      “a 70 ms bell-shaped force pulse which was 12N at its peak”

      (9) Line 175. What is a clockwise pulse? Was the force vector rotating in direction over time so that it was always acting orthogonally to the movement, or did it always act leftwards or rightwards?

      Response 2.A.9. The force vector was not rotating in direction over time. Here, we used clockwise/counterclockwise to indicate rightwards/leftwards with respect to the ideal movement direction – the line from start position to target (which is what we understand the Reviewer means by “always act rightwards or leftwards”). We have clarified the text to indicate this (lines 193-195):

      …was applied by the robot lateral to the ideal movement direction (i.e. the direction formed between the center of the start position and the center of the target) after participants reached 2cm away from the starting position (Smith and Shadmehr, 2005; Fine and Thoroughman, 2006).

      (10) Lines 177-182. It might be useful to explicitly mention the frequency of each of the perturbations, just for ease of the reader. 

      Response 2.A.10. We have added this information to our Methods (lines 206-210):

      Thus, in summary, each 96-movement block consisted of 64 unperturbed movements and 32 movements perturbed with a force pulse (16 clockwise, and 16 counter-clockwise). For 20 out of the 96 movements in each block, the hold period was extended to test the hold perturbation (4 trials for each of the 5 target locations, each one of the 4 trials testing one perturbation direction as shown in Figure 7C).

      (11) Line 191. Lines 188-190. It would be useful to see a sample of several of these force traces over time (0-5s) that were used to make the average for a position. That would give insight into the stability of the forces of a participant for one of the postures. These traces could be shown in Figure 2.

      Response 2.A.11. Thank you for your suggestion. We have added these panels to Figure 1, (as Figure 2 was already large). Each panel illustrates the three measurements taken at similar positions (closest to midline, distal from the body) and the same condition (paretic arm, with arm support given) for one participant (same participants as in Figure 2). Solid lines indicate the force on the x-axis (positive values indicate forces towards the left), whereas dashed lines indicate the force on the y-axis (positive values indicate forces towards the body). The shaded area indicates the part averaged in order to estimate the resting bias, illustrating how resting biases were relatively stable by the 2s mark. Note that these examples include one trial (blue traces in the third panel) which was rejected following visual inspection as described in Materials and Methods – Data Exclusion Criteria (“trials where forces appeared unstable and/or there was movement during the robot hold period”). We find this helpful as this illustrates (and motivates) one component of our methodology. 

      (12) Line 196. Figure 1D (not 1E).  

      Response 2.A.12. Thank you for catching this error, which we have now corrected.

      (13) Line 215: The authors mentioned similar results. Were there any different results that impacted interpretation? Some evidence of this, similar to and in addition to Supplementary 1, would be helpful. 

      Response 2.A.13. We repeated our analyses without these exclusion criteria, with no impact to the interpretation. We now include versions of the main outcome panels from Figures 5, 6, and 7 in the supplementary materials calculated without this outlier exclusion (Figures S5-E, S6-E, and S7-E, respectively). 

      (14) Line 231: Perhaps better to explicitly state the furthest three positions are being across as the distal targets for the ANOVA. 

      Response 2.A.14. Thank you for your suggestion. We now explicitly clarify this in line 276:

      “distal targets [furthest three positions] vs. proximal targets [closest two positions]”

      (15) Figure 3B, lines 265. Clearly, these are different, but the authors should report statistics. 

      Response 2.A.15. We now report these numbers (lines 339-346 of the revised manuscript, which also include statistics related to bias direction as described in 2.A.17 below).

      (16) Figure 2 should have a heat map scale.  

      Response 2.A.16. We have now added this (also Response 1.A.7), including an explanation of what the heat map represents in the caption.

      (17) Figure 3C: It would be useful to quantify and plot the direction of the resting force bias vector. 

      Response 2.A.17. Thank you for your suggestion. We have expanded Figure 3 to include the average direction of the resting force bias vector (note the readjustment of colors following Reviewer 1’s comment: striped bars indicate No Support data, and full bars indicate Support data, with the colors being the same). The direction of the force bias vector, however, may not be very informative in cases where the magnitude is small (and the signal-to-noise ratio is small), whereas averaging the direction of the force bias vector across different positions for one participant may average out systematic variations in this direction across different locations. Nevertheless, the average direction appears generally towards the body (around -90°, or 6 o’clock) even in the non-paretic and control data (though the noise – as suggested by the size of the errorbars – is much higher in the latter cases, especially when the arm is supported). This is a (weak) suggestion that these resting biases may be present, though much subdued, in the nonparetic limb and healthy individuals; further work will be needed to elucidate this.

      (18) Line 428. It is not significantly longer compared to controls. Can the authors slightly revise this sentence?

      Response 2.A.18. We have revised this sentence (lines 529-532):

      Patients showed impaired capacity to resist and recover from this perturbation (the abrupt release of the imposed force). The time to stabilization for the paretic side (0.94±0.05s) was longer compared to the non-paretic side (0.79±0.03s, p = 0.024) and controls (0.78±0.06s, though this was statistically marginal, p = 0.061) as shown in Figure 7E, left.

      (19) Line 541. It is unclear how these data support the idea of three distinct controllers. Can the authors please clarify? 

      Response 2.A.19. Here, we compared our findings to previous ideas about distinct controllers, and discuss a potential fusion of these ideas with ours. Specifically, we find that holding is distinct from both initial reaching and coming to a stop. Previous work argues that initial reaching and coming to a stop are themselves distinct (Ghez et al., 2007; Jayasinghe et al., 2022). Combining these two sets of arguments, we arrive at the possibility of three distinct controllers. 

      (20) It would be useful if the authors provided a definition of synergy, as well as distinguishing between muscle and movement synergies. 

      Response 2.A.20. We now provide this in lines 591-594:

      Here, “synergies” refer to abnormal co-activation patterns across joints that manifest as the patient tries to move – for example, the elbow involuntarily flexing as the patient tries to abduct their shoulder (Twitchell, 1951; Brunnstrom, 1966). 

      (21) Line 592-593. The wording of this sentence could be improved. 

      Response 2.A.21. We have switched this sentence to active voice for more clarity:

      Thus, while full weight support reduces both resting flexor biases and movement-related flexor synergies, this reduction seems more complete for synergies rather than resting biases.

      (22) Figure 9. In the left column, it should read normal synergies and normal resting posture.  

      Response 2.A.22. We intentionally used the same terminology, as the idea behind our conceptual model is that these patterns, which manifest as well-recognized abnormal synergies and abnormal resting postures in stroke, may be present in the healthy motor system as well, but kept in check by CST moderating the RST. At the same time, we recognize that, by definition, synergies and posture in controls are the “normal” reference point against which “abnormal” synergies and posture are defined after stroke. To clarify this issue, we thus decided to forgo the use of the terms “abnormal” in the figure, and instead refer to “synergistic movement ” and “synergistic resting posture”.

      (23) Figure 9. With stroke, is RST upregulated, a decreased influence of CST, or both? All seem plausible.

      Response 2.A.23a. We believe both can be happening. From previous work (e.g. McPherson et al., 2018) it seems safe to say that RST upregulation is the case, whereas one would also expect a decreased CST influence due to its damage due to the stroke. The relative weight of these influences would be interesting to elucidate in future work.

      I have not read the paper, but did McPherson et al., 2018 test these different hypotheses?  

      Response 2.A.23b. The main point of McPherson et al., 2018 is that increased synergy expression is due to increased RST involvement, rather than reduced CST influence. However, McPherson et al. do not show separate increases/reductions in RST/CST activity; they show that contralesional activity relative to ipsilesional activity is increased (using a laterality index). While it does seem that RST is upregulated in this case, this does not exclude the possibility that CST influence is reduced as well.

      We also noticed that the citation itself, while mentioned in the text, was missing from the bibliography. This is now fixed.

      For Figure 9, McPherson is cited as they provide evidence for the idea that RST involvement increases when arm support is decreased. This evidence is both direct (e.g. in their Figure 3 where they show that “Stroke participants exhibited increased activity in the contralesional (R) hemisphere as SABD loading increased” [i.e. arm support was reduced]) and indirect: they connect synergies to RST involvement, and also show increased synergies with reduced arm support (also shown multiple times previously). Both these arguments suggest that arm support reduces RST involvement. We have clarified the relevant sentence:

      The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed. Supporting this idea, McPherson et al. found increased ipsilateral activity (which primarily represents activation via the descending RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).

      Reviewer #3 (Recommendations For The Authors):

      For Experiment 2, it is not immediately clear how the within-subject values are being pooled and compared across the different conditions. For instance, in the static perturbation trials, there are four blocks with 20 perturbation trials per block per arm (80 total per arm) with each location and direction once per block. For each participant, the comparison is between the location/direction that was most opposed (although this doesn't look accurately represented in Fig 7F). Therefore, the within-subject comparison is 4 trials per participant? Were these values averaged or pooled? It is a little odd that the SD for all the within-subjects trials are identical or nearly identical across conditions especially when looking at the example patient data in 7B and 7F.  

      Response 3.A.1. For static perturbation trials, the within-subject comparison involves 8 trials per participant: 4 trials corresponding to the perturbation direction/position combination with resting bias most opposed to the perturbation, and 4 trials corresponding to the perturbation direction/position combination with resting bias most aligned with the perturbation. These values were averaged for each individual. We have expanded our methods to make this part of our data analysis clear (lines 284-296) for all types of comparisons (unperturbed movement, pulse perturbation, static perturbations – now referred to as “release perturbation”).

      The across-subject SDs for the average resting forces for each one of these two conditions, shown in Figure 7F are indeed identical. This is due to how these two instances (most aligned vs. most resistive) were selected: because the perturbation directions come in pairs that exactly oppose each other (Figure 7B), if one were to select the position with the most opposing resting bias, that would mean that the combination with same position and the oppositely-directed perturbation would be the one with the most assistive resting bias. Hence the resting biases selected for the most opposing/assistive instances would be equal in magnitude and opposite to each other for each participant, as illustrated in Figure 7F, whereby the most-opposed bias for each individual is exactly opposite to the corresponding most-aligned bias for the same individual. We have added a brief commentary about this on the caption (lines 551-554), reproduced below:

      Note how the most-opposed resting bias for each patient is equal and opposite to the their mostaligned resting bias. This is because the same resting bias, when projected along the direction of two oppositely-directed perturbations (illustrated in C), it would oppose one with the same magnitude it would align with the other.

      Importantly, following suggestions by Reviewer 2 (see point 2.A.1), we now provide supplementary analyses that use the entirety of the relevant data, rather than the most extreme instances, which provide evidence supporting our main findings (Figures S5-2, S6-2, and S7-2).

      The printed colors in Figure 3 are very muddled and hard to read/interpret, especially in panel A. 

      Response 3.A.2. Thank you for pointing out this issue, also raised by Reviewer 1. We have adjusted the colors to be more distinct from each other and look clear both in print and on-screen, making use of dashed lines and stripes rather than different shades.

      I think it would improve readability and interpretation if Figure 8 and the results related to FM-UE were contained within the description of results for Experiment 1.

      Response 3.A.3. Thank you for this suggestion. This is actually a debate we had among ourselves earlier, and we can see merits to either ordering. It is very arguable that moving Figure 8 and the FMUE results within the rest of Experiment 1 may improve readability somewhat. However, we believe that presenting these results at the end better serves to illustrate the apparent paradox between the lack of direct connection between resting biases and active movement on one hand, and the relationship between resting biases and abnormal synergies on the other. We believe that this better sets the stage to present our conceptual model, which explains this paradox based on the role arm support plays in modulating the expression of both resting biases and abnormal synergies.

      Additional changes/corrections not outlined above

      Figure 1D displayed a right arm, but showed a target array (red dots) for a left arm paradigm. We now flip the target array shown for consistency.

      We corrected Figure 6C, which accidentally used an earlier definition of settling time which was based on lateral stabilization throughout the entire movement, rather focus on the period immediately following the pulse. The intended definition of settling time (as we had described in the Methods, lines 204-206 of original submission) focuses on lateral corrections specific to the pulse (rather than corrections when the participant approaches the endpoint) and better matches the one for settling time for the release (static) perturbation trials. Note that this change did not affect the (lack of) relationship between settling time and resting force bias, both across individuals (correlation plots now in Figure S6-1) and within individuals (now shown in the right part of panel 6D). Also in panel C, an error in the scaling for the maximum lateral deviation in the pulse direction (right side of the panel) is also now corrected.

      In addition, we made minor edits throughout the text to improve readability.

      References

      Albert ST, Hadjiosif AM, Jang J, Zimnik AJ, Soteropoulos DS, Baker SN, Churchland MM, Krakauer JW, Shadmehr R (2020) Postural control of arm and fingers through integration of movement commands. Elife 9:e52507.

      Avni I, Arac A, Binyamin-Netser R, Kramer S, Krakauer JW, Shmuelof L (2024) The Kinematics of 3D Arm Movements in Sub-Acute Stroke: Impaired Inter-Joint Coordination is Attributable to Both Weakness and Flexor Synergy Intrusion. Neurorehabil Neural Repair 38:646–658.

      Bourbonnais D, VANDEN NOVEN S, Carey KM, Rymer WZ (1989) Abnormal spatial patterns of elbow muscle activation in hemiparetic human subjects. Brain 112:85–102.

      Brunnstrom S (1966) Motor testing procedures in hemiplegia: based on sequential recovery stages. Phys Ther 46:357–375.

      Cortes JC, Goldsmith J, Harran MD, Xu J, Kim N, Schambra HM, Luft AR, Celnik P, Krakauer JW,

      Kitago T (2017) A Short and Distinct Time Window for Recovery of Arm Motor Control Early After Stroke Revealed With a Global Measure of Trajectory Kinematics. Neurorehabil Neural Repair 31:552–560.

      Duque J, Thonnard J, Vandermeeren Y, Sébire G, Cosnard G, Olivier E (2003) Correlation between impaired dexterity and corticospinal tract dysgenesis in congenital hemiplegia. Brain 126:732–747.

      Fine MS, Thoroughman KA (2006) Motor Adaptation to Single Force Pulses: Sensitive to Direction but Insensitive to Within-Movement Pulse Placement and Magnitude. J Neurophysiol 96:710–720.

      Ghez C, Scheidt R, Heijink H (2007) Different Learned Coordinate Frames for Planning Trajectories and Final Positions in Reaching. J Neurophysiol 98:3614–3626.

      Hadjiosif AM, Branscheidt M, Anaya MA, Runnalls KD, Keller J, Bastian AJ, Celnik PA, Krakauer JW (2022) Dissociation between abnormal motor synergies and impaired reaching dexterity after stroke. J Neurophysiol 127:856–868.

      Jayasinghe SA, Scheidt RA, Sainburg RL (2022) Neural Control of Stopping and Stabilizing the Arm. Front Integr Neurosci 16.

      Kanade-Mehta P, Bengtson M, Stoeckmann T, McGuire J, Ghez C, Scheidt RA (2023) Spatial mapping of posture-dependent resistance to passive displacement of the hypertonic arm post-stroke. J NeuroEngineering Rehabil 20:163.

      Lawrence DG, Kuypers HG (1968) The functional organization of the motor system in the monkey: II. The effects of lesions of the descending brain-stem pathways. Brain 91:15–36.

      Levin MF (1996) Interjoint coordination during pointing movements is disrupted in spastic hemiparesis. Brain 119:281–293.

      Lowrey CR, Bourke TC, Bagg SD, Dukelow SP, Scott SH (2019) A postural unloading task to assess fast corrective responses in the upper limb following stroke. J NeuroEngineering Rehabil 16:1–17.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225.

      McPherson LM, Dewald JP (2022) Abnormal synergies and associated reactions post-hemiparetic stroke reflect muscle activation patterns of brainstem motor pathways. Front Neurol 13:934670.

      Porter R, Lemon R (1995) Corticospinal function and voluntary movement. Oxford University Press.

      Smith MA, Brandt J, Shadmehr R (2000) Motor disorder in Huntington’s disease begins as a dysfunction in error feedback control. Nature 403:544.

      Smith MA, Shadmehr R (2005) Intact ability to learn internal models of arm dynamics in Huntington’s disease but not cerebellar degeneration. J Neurophysiol 93:2809–2821.

      Tower SS (1940) Pyramidal lesion in the monkey. Brain 63:36–90.

      Twitchell TE (1951) The restoration of motor function following hemiplegia in man. Brain 74:443–480.

      Wilkins KB, Yao J, Owen M, Karbasforoushan H, Carmona C, Dewald JP (2020) Limited capacity for ipsilateral secondary motor areas to support hand function post-stroke. J Physiol 598:2153– 2167.

      Zaaimi B, Edgley SA, Soteropoulos DS, Baker SN (2012) Changes in descending motor pathway connectivity after corticospinal tract lesion in macaque monkey. Brain 135:2277–2289.

    1. eLife Assessment

      This study provides important insights into the regulation of a retained intron in the mRNA coding for OGT, a process known to be regulated by the O-GlcNAc cycling system, and highlights the functional role of the splicing regulator SFSWAP. The evidence supporting the claims of the authors is convincing: the authors performed an elegant state-of-the-art CRISPR knockout strategy and sophisticated bioinformatic analysis to identify SFSWAP as a negative regulator of alternative splicing. The work will be of interest to researchers in the fields of splicing and glycobiology.

    2. Reviewer #1 (Public review):

      Summary:

      Govindan and Conrad use a genome-wide CRISPR screen to identify genes regulating retention of intron 4 in OGT, leveraging an intron retention reporter system previously described (PMID: 35895270). Their OGT intron 4 reporter reliably responds to O-GlcNAc levels, mirroring the endogenous splicing event. Through a genome-wide CRISPR knockout library, they uncover a range of splicing-related genes, including multiple core spliceosome components, acting as negative regulators of OGT intron 4 retention. They choose to follow up on SFSWAP, a largely understudied splicing regulator shown to undergo rapid phosphorylation in response to O-GlcNAc level changes (PMID: 32329777). RNA-sequencing reveals that SFSWAP depletion not only promotes OGT intron 4 splicing but also broadly induces exon inclusion and intron splicing, affecting decoy exon usage. While this study offers interesting insights into intron retention and O-GlcNAc signaling regulation, the RNA sequencing experiments lack the essential controls needed to provide full confidence to the authors' conclusions.

      Strengths:

      (1) This study presents an elegant genetic screening approach to identify regulators of intron retention, uncovering core spliceosome genes as unexpected positive regulators of intron retention.

      (2) The work proposes a novel functional role for SFSWAP in splicing regulation, suggesting that it acts as a negative regulator of splicing and cassette exon inclusion, which contrasts with expected SR-related protein functions.

      (3) The authors suggest an intriguing model where SFSWAP, along with other spliceosome proteins, promotes intron retention by associating with decoy exons.

      Weaknesses:

      (1) The conclusions on SFSWAP impact on alternative splicing are based on cells treated with two pooled siRNAs for five days. This extended incubation time without independent siRNA treatments raises concerns about off-target effects and indirect effects from secondary gene expression changes, potentially limiting confidence in direct SFSWAP-dependent splicing regulation. Rescue experiments and shorter siRNA-treatment incubation times could address these issues.

      (2) The mechanistic role of SFSWAP in splicing would benefit from further exploration. Key questions remain, such as whether SFSWAP directly binds RNA, specifically the introns and exons (including the decoy exons) it appears to regulate. Furthermore, given that SFSWAP phosphorylation is influenced by changes in O-GlcNAc signaling, it would be interesting to investigate this relationship further. While generating specific phosphomutants may not yield definitive insights due to redundancy and also beyond the scope of the study, the authors could examine whether distinct SFSWAP domains, such as the SR and SURP domains, which likely overlap with phosphorylation sites, are necessary for regulating OGT intron 4 splicing.

      (3) Data presentation could be improved (specific suggestions are included in the recommendations section). Furthermore, Excel tables with gene expression and splicing analysis results should be provided as supplementary datasheets. Finally, a more detailed explanation of statistical analyses is necessary in certain sections.

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly.

      Strengths:

      (1) Exhaustive analysis of potential splicing factors in an unbiased screen.

      (2) Extensive genome wide bioinformatic analysis.

      (3) Thoughtful discussion and literature survey.

      Weaknesses:

      (1) No firm evidence linking SFSWA to an O-GlcNAc specific mechanism.

      (2) Resulting model leaves many unanswered questions.

    4. Reviewer #3 (Public review):

      Summary:

      The major novel finding in this study is that SFSWAP, a splicing factor containing an RS domain but no canonical RNA binding domain, functions as a negative regulator of splicing. More specifically, it promotes retention of specific introns in a wide variety of transcripts including transcripts from the OGT gene previously studied by the Conrad lab. The balance between OGT intron retention and OGT complete splicing is an important regulator of O-GlcNAc expression levels in cells.

      Strengths:

      An elegant CRISPR knockout screen employed a GFP reporter, in which GFP is efficiently expressed only when the OGT retained intron is removed (so that the transcript will be exported from the nucleus to allow for translation of GFP). Factors whose CRISPR knockdown causes decreased intron retention therefore increase GFP, and can be identified by sequencing RNA of GFP-sorted cells. SFSWAP was thus convincingly identified as a negative regulator of OGT retained intron splicing. More focused studies of OGT intron retention indicate that it may function by regulating a decoy exon previously identified in the intron, and that this may extend to other transcripts with decoy exons.

      Weaknesses:

      The mechanism by which SFSWAP represses retained introns is unclear, although some data suggests it can operate (in OGT) at the level of a recently reported decoy exon within that intron. Interesting/appropriate speculation about possible mechanisms are provided and will likely be the subject of future studies.

      Overall the study is well done and carefully described but some figures and some experiments should be described in more detail.

    1. eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

    4. Author response:

      eLife Assessment

      This study provides useful findings about the effects of heterozygosity for Trio variants linked to neurodevelopmental and psychiatric disorders in mice. However, the strength of the evidence is limited and incomplete mainly because the experimental flow is difficult to follow, raising concerns about the conclusions' robustness. Clearer connections between variables, such as sex, age, behavior, brain regions, and synaptic measures, and more methodological detail on breeding strategies, test timelines, electrophysiology, and analysis, are needed to support their claims.

      We appreciate the opportunity to address the constructive feedback provided by eLife and the reviewers. Below, we respond to the overall assessment and individual reviewers' comments, clarifying our experimental approach, addressing concerns, and providing additional details where necessary.

      We thank the editors for highlighting the significance of our findings regarding the effects of Trio variant heterozygosity in mice. We acknowledge the feedback concerning the experimental flow and agree that clarity is paramount. To address these concerns:

      (1) Connections between variables: We will revise the manuscript to explicitly outline and extend explanations and the relationships between sex, age, behavior, brain regions, and synaptic measures, ensuring that the rationale for each experiment and its relevance to the overall conclusions are improved.

      (2) Methodological details: Our paper Methods section was formatted to be short with additional details provided in the Supplemental Methods section.  We will merge all into an extended section to improve clarity. We will also expand on our breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section. These additions aim to enhance the transparency and reproducibility of our study and to ensure full support of our conclusions.

      (3) Experimental flow: We will revise and extend our results, methods, and discussion sections to clarify the rationale and experimental design to guide readers through the experimental sequence and rationale.

      We are confident these revisions address the concerns raised and enhance the robustness and coherence of our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study explores how heterozygosity for specific neurodevelopmental disorder-associated Trio variants affects mouse behavior, brain structure, and synaptic function, revealing distinct impacts on motor, social, and cognitive behaviors linked to clinical phenotypes. Findings demonstrate that Trio variants yield unique changes in synaptic plasticity and glutamate release, highlighting Trio's critical role in presynaptic function and the importance of examining variant heterozygosity in vivo.

      Strengths:

      This study generated multiple mouse lines to model each Trio variant, reflecting point mutations observed in human patients with developmental disorders. The authors employed various approaches to evaluate the resulting behavioral, neuronal morphology, synaptic function, and proteomic phenotypes.

      Weaknesses:

      While the authors present extensive results, the flow of experiments is challenging to follow, raising concerns about the strength of the experimental conclusions. Additionally, the connection between sex, age, behavioral data, brain regions, synaptic transmission, and plasticity lacks clarity, making it difficult to understand the rationale behind each experiment. Clearer explanations of the purpose and connections between experiments are recommended. Furthermore, the methodology requires more detail, particularly regarding mouse breeding strategies, timelines for behavioral tests, electrophysiology conditions, and data analysis procedures.

      We appreciate the reviewer’s recognition of the novelty and comprehensiveness of our approach, particularly the generation of multiple mouse lines and our efforts to model Trio variant effects in vivo.

      Weaknesses

      (1) Experimental flow and rationale and connection between variables: We will expand on the connections between behavioral data, neuronal morphology, synaptic function, and proteomics in the Results and Discussion sections to clarify how each experiment informs the reasoning and the conclusions and to highlight the relationships between sex, age, behavior, and synaptic measures.

      (2) Methodological details: Our paper Methods section was formatted to be short to fulfill word limits on the submitted version, with additional details provided in the Supplemental Methods section. We will merge our Methods and Supplemental Methods sections and expand on our breeding strategies, test timelines, electrophysiological protocols, and data analysis methods in the revised Methods section.  These additions aim to enhance the transparency and reproducibility of our study and to ensure full support of our conclusions.

      Reviewer #2 (Public review):

      Summary:

      The authors generated three mouse lines harboring ASD, Schizophrenia, and Bipolar-associated variants in the TRIO gene. Anatomical, behavioral, physiological, and biochemical assays were deployed to compare and contrast the impact of these mutations in these animals. In this undertaking, the authors sought to identify and characterize the cellular and molecular mechanisms responsible for ASD, Schizophrenia, and Bipolar disorder development.

      Strengths:

      The establishment of TRIO dysfunction in the development of ASD, Schizophrenia, and Bipolar disorder is very recent and of great interest. Disorder-specific variants have been identified in the TRIO gene, and this study is the first to compare and contrast the impact of these variants in vivo in preclinical models. The impact of these mutations was carefully examined using an impressive host of methods. The authors achieved their goal of identifying behavioral, physiological, and molecular alterations that are disorder/variant specific. The impact of this work is extremely high given the growing appreciation of TRIO dysfunction in a large number of brain-related disorders. This work is very interesting in that it begins to identify the unique and subtle ways brain function is altered in ASD, Schizophrenia, and Bipolar disorder.

      Weaknesses:

      (1) Most assays were performed in older animals and perhaps only capture alterations that result from homeostatic changes resulting from prodromal pathology that may look very different.

      (2) Identification of upregulated (potentially compensating) genes in response to these disorder-specific Trio variants is extremely interesting. However, a functional demonstration of compensation is not provided.

      (3) There are instances where data is not shown in the manuscript. See "data not shown". All data collected should be provided even if significant differences are not observed.

      I consider weaknesses 1 and 2 minor. While they would very interesting to explore, these experiments might be more appropriate for a follow-up study. I would recommend that the missing data in 3 should be provided in the supplemental material.

      We are grateful for the reviewer’s recognition of our study’s significance and methodological rigor. The acknowledgment of Trio dysfunction as a novel and impactful area of research is deeply appreciated.

      Weaknesses: 

      We agree that focusing on older animals may limit insights into early-stage pathophysiology. However, given the goal of this study was to examine the functional impacts of Trio heterozygosity at an adolescent stage and to reveal the ultimate impact of these alleles on synaptic function, we believe the choice of animal age aligns with our objectives. We agree that future studies of earlier developmental stages will be beneficial and complement these findings.

      Functional compensation: In this study, we tested functional compensation through rescue experiments in +/K1431M brain slices using a Rac1-specific inhibitor, NSC, which prevents its activation by Trio or Tiam1. Our findings strongly suggest that increased Rac1 activity, attributed to the proposed compensation, drives the deficiency in neurotransmitter release. Furthermore, this deficiency can be normalized by direct Rac1 inhibition.

      Data not shown: We will incorporate all previously shown data into the Supplemental Materials, even when results are nonsignificant. We agree that this ensures full transparency and facilitates a more comprehensive evaluation of our findings.

    1. eLife Assessment

      This study investigates the fundamental role of polyunsaturated fatty acids (PUFAs) in membrane biology, using a unique model to perform a thorough genetic screen that highlights that PUFA synthesis defects cannot be compensated for by mutations in other pathways. While the data are solid and generally support the claims, additional experimental validation or more detailed descriptions of their results would strengthen the broader conclusions. This study will appeal to researchers in membrane biology, lipid metabolism, and C. elegans genetics.

    2. Reviewer #1 (Public review):

      Summary:<br /> This study addresses the roles of polyunsaturated fatty acids (PUFAs) in animal physiology and membrane function. A C. elegans strain carrying the fat-2(wa17) mutation possess a very limited ability to synthesize PUFAs and there is no dietary input because the E. coli diet consumed by lab grown C. elegans does not contain any PUFAs. The fat-2 mutant strain was characterized to confirm that the worms grow slowly, have rigid membranes, and have a constitutive mitochondrial stress response. The authors showed that chemical treatments or mutations known to increase membrane fluidity did not rescue growth defects. A thorough genetic screen was performed to identify genetic changes to compensate for the lack of PUFAs. The newly isolated suppressor mutations that compensated for FAT-2 growth defects included intergenic suppressors in the fat-2 gene, as well as constitutive mutations in the hypoxia sensing pathway components EGL-9 and HIF-1, and loss of function mutations in ftn-2, a gene encoding the iron storage protein ferritin. Taken together, these mutations lead to the model that increased intracellular iron, an essential cofactor for fatty acid desaturases, allows the minimally functional FAT-2(wa17) enzyme to be more active, resulting in increased desaturation and increased PUFA synthesis.

      Strengths:<br /> (1) This study provides new information further characterizing fat-2 mutants. The authors measured increased rigidity of membranes compared to wild type worms, however this rigidity is not able to be rescued with other fluidity treatments such as detergent or mutants. Rescue was only achieved with polyunsaturated fatty acid supplementation.<br /> (2) A very thorough genetic suppressor screen was performed. In addition to some internal fat-2 compensatory mutations, the only changes in pathways identified that are capable of compensating for deficient PUFA synthesis was the hypoxia pathway and the iron storage protein ferritin. Suppressor mutations included an egl-9 mutation that constitutively activates HIF-1, and Gain of function mutations in hif-1 that are dominant. This increased activity of HIF conferred by specific egl-9 and hif-1 mutations lead to decreased expression of ftn-2. Indeed, loss of ftn-2 leads to higher intracellular iron. The increased iron apparently makes the FAT-2 fatty acid desaturase enzyme more active, allowing for the production of more PUFAs.<br /> (3) The mutations isolated in the suppressor screen show that the only mutations able to compensate for lack of PUFAs were ones that increased PUFA synthesis by the defective FAT-2 desaturase, thus demonstrating the essential need for PUFAs that cannot be overcome by changes in other pathways. This is a very novel study, taking advantage of genetic analysis of C. elegans, and it confirms the observations in humans that certain essential PUFAs are required for growth and development.<br /> (4) Overall, the paper is well written, and the experiments were carried out carefully and thoroughly. The conclusions are well supported by the results.

      Weaknesses:<br /> Overall, there are not many weaknesses. The main one I noticed is that the lipidomic analysis shown in Figs 3C, 7C, S1 and S3. Whie these data are an essential part of the analysis and provide strong evidence for the conclusions of the study, it is unfortunate that the methods used did not enable the distinction between two 18:1 isomers. These two isomers of 18:1 are important in C. elegans biology, because one is a substrate for FAT-2 (18:1n-9, oleic acid) and the other is not (18:1n-7, cis vaccenic acid). Although rarer in mammals, cis-vaccenic acid is the most abundant fatty acid in C. elegans and is likely the most important structural MUFA. The measurement of these two isomers is not essential for the conclusions of the study, but the manuscript should include a comment about the abundance of oleic vs vaccenic acid in C. elegans (authors can find this information, even in the fat-2 mutant, in other publications of C. elegans fatty acid composition). Otherwise, readers who are not familiar with C. elegans might assume the 18:1 that is reported is likely to be mainly oleic acid, as is common in mammals.

      Other suggestions to authors to improve the paper:<br /> (1) The title could be less specific; it might be confusing to readers to include the allele name in the title.<br /> (2) There are two errors in the pathway depicted in Figure 1A. The16:0-16:1 desaturation can be performed by FAT-5, FAT-6, and FAT-7. The 18:0-18:1 desaturation can only be performed by FAT-6 and FAT-7

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors use a genetic screen in C. elegans to investigate the physiological roles of polyunsaturated fatty acids (PUFAs). They screen for mutations that rescue fat-2 mutants, which have strong reductions in PUFAs. As a result, either mutations in fat-2 itself, or mutations in genes involved in the HIF-1 pathway, were found to rescue fat-2 mutants.

      Strengths:<br /> As C. elegans can produce PUFAs de novo as essential lipids, the genetic model is well suited to study the fundamental roles of PUFAs, and the results are very interesting. The genetic screen finds mutations in convergent pathways, suggesting that it has reached near-saturation. The link between the HIF-1 pathway and lipid unsaturation is very interesting. As many of the mutations found to rescue fat-2 mutants are of gain-of-function, it is unlikely that similar discoveries could have been made with other approaches like genome-wide CRISPR screenings, making the current study distinctive.

      Weaknesses:<br /> The authors make very important statements, but some are not sufficiently supported by data. In page 5, they conclude that membrane rigidity is a minor cause of fat-2 mutant defects, which is a relevant observation regarding why PUFAs are important. However, they use treatments that have rescued fluidity in another mutant (paqr-2), but do not test if they have the same fluidifying effects in fat-2 mutants.

      The screening results seem to converge into the HIF-1 pathway, which is hypothetically correct according to the literature. However, the authors do not validate this hypothesis, which is a critical limitation, especially because many of the mutations they obtained seem to be of gain-of-function. Therefore, without experimental testing, it cannot be concluded that the mutations have the expected effect on the HIF-1 pathway.

      In some of the mutants, the rescues in lipid compositions seem to be weak, and it is arguable whether phenotypic rescues are really via a restoration in lipid compositions.

      The hypothesis linking iron homeostasis and the rescue of fat-2 mutants is interesting, but the data of rescue by iron repletion seem to be against it. The results might be due to the inefficiency in iron repletion, as the authors suggest, but this has not been formally addressed.

      Therefore, the authors propose multiple very interesting and important hypotheses, but experimental validations remain limited.

    4. Author response:

      We thank the editors at eLife and the reviewers for the care with which our mansucript has been reviewed and the constructive feedback that we have received. Both reviewers viewed the manuscript positively and in particular praised the merits of the forward genetic screen that led to the discovery of a new link between the HIF-1 pathway and fatty acid desaturation.

      We agree with all points by Reviewer #1. We will modify our manuscript to clarify that two types of C18:1 fatty acids are present in our lipidomics, and that the majority is likely vaccenic acid that is not a FAT-2 substrate. The title will be modified and Fig. 1A corrected.

      All points raised by Reviewer #2 are also valid and we will try to address most of them experimentally, though not always as suggested. In particular, we plan to use FRAP to verify that membrane-fluidizing treatments are effective in the fat-2 mutant. We also plan to use qPCR to test whether the novel egl-9(lof) and hif-1(gof) alleles lead to the expected downregulation of ftn-2. We note that the pathway connecting EGL-9, HIF-1 and FTN-2 is well supported by published work and that the alleles isolated in our screen are consistent with it, with the addition that FAT-2 is likely a regulated outcome of FTN-2 inhibition/mutation. We also plan to monitor FAT-2 protein levels using Western blots and thus provide more clarity about the mechanism of action of the novel fat-2(wa17) suppressors. The manuscript will be modified to tone down interpretations not directly supported by experiments.

    1. eLife Assessment

      This paper provides a valuable contribution to our understanding of how adenosine acts as a signal of nutrient insufficiency and extends this idea to suggest that adenosine is released by metabolically active cells in proportion to the activity of methylation events. Convincing data support this idea. The authors use metabolic tracing approaches to identify the biochemical pathways that contribute to the regulation of adenosine levels and the S-adenosylmethionine cycle in Drosophila larval hemocytes in response to wasp egg infection.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Nedbalova et al. investigate the biochemical pathway that acts in circulating immune cells to generate adenosine, a systemic signal that directs nutrients toward the immune response, and S-adenosylmethionine (SAM), a methyl donor for lipid, DNA, RNA, and protein synthetic reactions. They find that SAM is largely generated through the uptake of extracellular methionine, but that recycling of adenosine to form ATP contributes a small but important quantity of SAM in immune cells during the immune response. The authors propose that adenosine serves as a sensor of cell activity and nutrient supply, with adenosine secretion dominating in response to increased cellular activity. Their findings of impaired immune action but rescued larval developmental delay when the enzyme Ahcy is knocked down in hemocytes are interpreted as due to effects on methylation processes in hemocytes and reduced production of adenosine to regulate systemic metabolism and development, respectively. Overall this is a strong paper that uses sophisticated metabolic techniques to map the biochemical regulation of an important systemic mediator, highlighting the importance of maintaining appropriate metabolite levels in driving immune cell biology.

      Strengths:

      The authors deploy metabolic tracing - no easy feat in Drosophila hemocytes - to assess flux into pools of the SAM cycle. This is complemented by mass spectrometry analysis of total levels of SAM cycle metabolites to provide a clear picture of this metabolic pathway in resting and activated immune cells.

      The experiments show that the recycling of adenosine to ATP, and ultimately SAM, contributes meaningfully to the ability of immune cells to control infection with wasp eggs.

      This is a well-written paper, with very nice figures showing metabolic pathways under investigation. In particular, the italicized annotations, for example, "must be kept low", in Figure 1 illustrate a key point in metabolism - that cells must control levels of various intermediates to keep metabolic pathways moving in a beneficial direction.

      Experiments are conducted and controlled well, reagents are tested, and findings are robust and support most of the authors' claims.

      Weaknesses:

      The authors posit that adenosine acts as a sensor of cellular activity, with increased release indicating active cellular metabolism and insufficient nutrient supply. It is unclear how generalizable they think this may be across different cell types or organs.

      The authors extrapolate the findings in Figure 3 of decreased extracellular adenosine in ex vivo cultures of hemocytes with knockdown of Ahcy (panel B) to the in vivo findings of a rescue of larval developmental delay in wasp egg-infected larvae with hemocyte-specific Ahcy RNAi (panel C). This conclusion (discussed in lines 545-547) should be somewhat tempered, as a number of additional metabolic abnormalities characterize Ahcy-knockdown hemocytes, and the in vivo situation may not mimic the ex vivo situation. If adenosine (or inosine) measurements were possible in hemolymph, this would help bolster this idea. However, adenosine at least has a very short half-life.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors wish to explore the metabolic support mechanisms enabling lamellocyte encapsulation, a critical antiparasitic immune response of insects. They show that S-adenosylmethionine metabolism is specifically important in this process through a combination of measurements of metabolite levels and genetic manipulations of this metabolic process.

      Strengths:

      The metabolite measurements and the functional analyses are generally very strong and clearly show that the metabolic process under study is important in lamellocyte immune function.

      Weaknesses:

      The gene expression data are a potential weakness. Not enough is explained about how the RNAseq experiments in Figures 2 and 4 were done, and the representation of the data is unclear. The paper would also be strengthened by the inclusion of some measure of encapsulation effectiveness: the authors show that manipulation of the S-adenosylmethionine pathway in lamellocytes affects the ability of the host to survive infection, but they do not show direct effects on the ability of the host to encapsulate wasp eggs.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this study provide evidence that Drosophila immune cells show upregulated SAM transmethylation pathway and adenosine recycling upon wasp infection. Blocking this pathway compromises the lamellocyte formation, developmental delay, and host survival, suggesting its physiological relevance.

      Strengths:

      Snapshot quantification of the metabolite pool does not provide evidence that the metabolic pathway is active or not. The authors use an ex vivo isotope labelling to precisely monitor the SAM and adenosine metabolism. During infection, the methionine metabolism and adenosine recycling are upregulated, which is necessary to support the immune reaction. By combining the genetic experiment, they successfully show that the pathway is activated in immune cells.

      Weaknesses:

      The authors knocked down Ahcy to prove the importance of SAM methylation pathway. However, Ahcy-RNAi produces a massive accumulation of SAH, in addition to blocking adenosine production. To further validate the phenotypic causality, it is necessary to manipulate other enzymes in the pathway, such as Sam-S, Cbs, SamDC, etc. The authors do not demonstrate how infection stimulates the metabolic pathway given the gene expression of metabolic enzymes is not upregulated by infection stimulus.

    5. Author response:

      We would like to thank the editors and reviewers for reviewing our work, for finding it valuable supported by convincing data, which we greatly appreciate, but also for identifying the weaknesses of the manuscript. We plan to address these weaknesses in the revised version, briefly as follows:

      (1) In the Discussion, we will elaborate more on a possible generalization of our results, while being aware of the limited space in this experimental paper and therefore intend to address this in more detail and comprehensively in a subsequent perspective article.

      (2) In the Discussion, we will more clearly address the limitations of our work, in particular the difference between the measurement of extracellular adenosine production ex vivo and the actual production in vivo, where the measurement is indeed very challenging, and also the limitations of manipulating the SAM pathway only at the Ahcy level.

      (3) We will describe in detail and complement the supplementary RNAseq data. The RNAseq data have already been described in detail in our previous paper (doi.org/10.1371/journal.pbio.3002299), but we agree with the reviewers that we should describe the necessary details again here.

      (4) We will fill in the missing data on encapsulation efficiency; we agree that it was unfortunate to omit them.

      (5) We will supplement the data with methyltransferase expressions and better describe the changes in expression of some SAM pathway genes, which, especially with methyltransferase expressions, also support stimulation of this pathway by changes in expression. Although the goal of this work was to test by 13C-labeling whether SAM pathway activity is upregulated, not to analyze how the activity is regulated, we certainly agree that an explanation of possible regulation, especially in the context of the enzyme expressions we show, should be included in our work.

    1. eLife Assessment

      This valuable study investigates the neural basis of causal inference of illness, suggesting that it relies on semantic networks specific to living things in the absence of a generalized representation of causal inference across domains. However, the evidence remains incomplete due to some unjustified design and analysis choices. Moreover, the authors do not fully exploit the potential of multivariate fMRI analyses to rigorously test their main hypothesis.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:<br /> (a) Illness-Causal vs. Non-causal - Illness First;<br /> (b) Illness-Causal vs. Non-causal - Mechanical First;<br /> (c) Mechanical-Causal vs. Non-causal - Illness First;<br /> (d) Mechanical-Causal vs. Non-causal - Mechanical First.<br /> This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:<br /> In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      (6) Visual representation of results:<br /> In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      (7) Task selection:<br /> (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.<br /> (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:<br /> (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.<br /> (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.<br /> (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

    4. Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

    5. Author response:

      We thank the editors and reviewers for their comments on our manuscript. We found the comments of the reviewers helpful and plan to add new text, analyses, and figures to answer some of the outstanding questions.

      In response to the reviewers’ comments, we will clarify the goal of the paper in the introduction: to test the hypothesis that causal knowledge (i.e., an intuitive theory of biology) is embedded in domain-preferring semantic networks (i.e., semantic animacy network). This work links developmental psychology work on intuitive theories and cognitive neuroscience.

      As we will emphasize in the revised manuscript, the primary goal of the current paper is to test the claim that semantic networks encode causal knowledge, rather than to rule out the contribution of domain-general reasoning mechanisms to causal inference.

      In response to the reviewers’ suggestions, we will add multivariate and univariate whole-cortex analyses that provide further tests for domain-general causality responses. In particular, we will include new figures showing univariate responses to the mechanical inference condition over the non-causal control conditions as well as decoding between these conditions. The reviewers have also asked us to provide individual subject dispersion data. We appreciate this suggestion, and new figures will be added to display this information.

      We will also perform additional analysis in the precuneus (PC) to look for shared responses to illness and mechanical inferences. In accordance with our hypotheses, we have shown that the PC responds preferentially to illness inferences. To address the reviewers’ concerns about the selectivity of the PC to illness inferences, we will compare responses to i) illness inferences compared to the noncausal conditions and ii) mechanical inferences compared to the noncausal conditions in the PC to investigate the extent to which a shared response to causal inference across domains emerges in this region.

      Critically, we find that the cortical areas that distinguish between causal and non-causal conditions in a ‘domain general manner’ (i.e., for both illness and mechanical inferences) are driven by higher responses to the non-causal condition. Moreover, these responses in prefrontal cortex and elsewhere overlap an RT predictor of neural activity, suggesting that they may reflect difficulty effects.

      These results suggest that in the current task, signatures of causal inference are primarily found in domain-preferring semantic networks, rather than in domain-general fronto-parietal reasoning systems. We will provide additional discussion of the argument that the current results do not speak against the role of domain general systems across all types of causal reasoning. Instead, they suggest that the types of implicit causal inferences measured in the current study depend primarily on domain-preferring semantic networks.

      The reviewers have asked us to analyze responses to causal inferences about illness in the fusiform face area (FFA). We will perform this analysis. However, we note that univariate and multivariate whole-cortex analyses that are already included in the paper did not identify lateral ventral occipito-temporal cortex as a key region involved in causal inferences about illness. Further, we do not have FFA localizer data in the current participants; therefore, the results cannot be interpreted to reflect activity in functionally defined FFA.

      Two reviewers asked us to justify our choice of an implicit magic-detection task, which we will now do more clearly in the manuscript. This task was selected to ensure that participants were attending to the meaning of the vignettes. The goal of the current study was to investigate implicit causal inferences that routinely occur in language comprehension, e.g., when someone is reading a book. Past work has shown that explicitly judging the causality of causal and non-causal stimuli results in differences in response times across conditions (e.g., Kuperberg et al., 2006). In the current study, such judgments would also have introduced a confound between the behavioral decision and the condition of interest: the use of an explicit causal judgment task makes it impossible to know whether any observed neural differences between causal and non-causal conditions are simply due to differences in the selection of task responses. The selection of an orthogonal magic-detection task limits these confounds from complicating our interpretation of the neural data.

      One of the reviewers asked us to justify the number of catch trials that we decided to include in our paradigm. Approximately 20% of the vignettes were “magical” vignettes (the same proportion as each of the 4 experimental conditions) to encourage participants to remain attentive throughout the task. Since these catch trials are excluded from analysis, their proportion is unlikely to influence the results of the study. We will clarify this in the manuscript.

      A question was raised about the balance of trial numbers across conditions and across runs. To address this, we will include individual comparisons of each causal condition (n=36) with each non-causal condition (n=36; i.e., equal trial counts) where they are not already shown. With regard to runs, each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and is unlikely to impact the results. We will clarify this in the manuscript.

      We believe that our planned revisions will strengthen the paper and highlight its contributions to our understanding of the neural basis of implicit causal inference.

    1. eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs would strongly benefit from additional evidence emerging from multiple independent approaches. The evolutionary interpretations associated with the concept of "predatory genome" are premature based on the strength of evidence.

    2. Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

    3. Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We agree with the reviewer’s suggestions. We propose to use RNA-seq using an orthogonal platform as a solution. This will allow us to answer multiple questions viz. validation of expression of human DNA in mouse cells, obtaining a detailed insight into genes and pathways driven by human cfChPs and enable us to identify chimeric human and mouse transcripts.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We agree with the reviewer’s suggestion. We propose to show horizontal transfer of cfChPs using four different cell-lines representing four different species.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the genomes of the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version. It is likely that cell death resulting from large scale HGT creates a vicious cycle of more cell death induced by cfChPs thereby helping to explain the massive daily turnover of cells in the body (10<sup>9</sup> – 10<sup>12</sup> cells per day).  

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We think this is a matter of semantics. We have used the term “function” since cfChPs that enter the cell are biologically active; they transcribe, translate, synthesize, proteins and proliferate. We, therefore feel that the term function is not inappropriate.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We take the reviewer’s point. We will replace the term “predatory genome” with a more neutral and factual term “supernumerary genome” in the title and throughout the manuscript in the revised version.

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      We propose to revise the “discussion” section taking into account the issues raised by the reviewer and highlight the potential role of cfChPs in evolution by acting as vehicles of transposable elements.  

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      Our responses to this paragraph are given in the two above sections.

    1. eLife Assessment

      Valencia et al. combine elegant in vitro biochemical experiments with functional assays in cardiomyocytes to determine which properties of the FHOD3 formin are essential for sarcomere assembly. Using separation-of-function mutants, they show that FHOD3's elongation activity, rather than its nucleation, capping, or bundling activities, is key to its sarcomeric function. This is an important finding; most of the data presented in the manuscript are convincing, with the exception of two experiments (presence of FHOD3 at the barbed end of actin filaments in the TIRF elongation assays and characterization of the GS-FH1 mutant phenotype) that would merit few additional controls.

    2. Reviewer #1 (Public review):

      Summary:

      Formins are complex proteins with multiple effects on actin filament assembly, including nucleation, capping with processive elongation, and bundling. Determining which of these activities is important for a given biological process and normal cellular function is a major challenge.

      Here, the authors study the formin FHOD3L, which is essential for normal sarcomere assembly in muscle cells. They identify point mutants of FHOD3L in which formin nucleation and elongation/bundling activities are functionally separated. Expression of these mutants in neonatal rat ventricular myocytes shows that the control of actin filament elongation by formin is the major activity required for the normal assembly of functional sarcomeres.

      Strengths:

      The strength of this work is to combine sensitive biochemical assays with excellent work in neonatal rat ventricular myocytes. This combination of approaches is highly effective for analyzing the function of proteins with multiple activities in vitro.

      Weaknesses:

      FHOD3L does not seem to be the easiest formin to study because of its relatively weak nucleation activity and the short duration of capping events. This difficulty imposes rigorous biochemical analysis and careful interpretation of the data, which should be improved in this work.

    3. Reviewer #2 (Public review):

      This article elucidates the biochemical and cellular mechanisms by which the FHOD-family of formins, particularly FHOD3, contributes to sarcomere formation and contractility in cardiomyocytes. Formins are mainly known to nucleate and elongate actin filaments, with certain family members also exhibiting capping, severing, and bundling activities. Although FHOD3 has been well-established as essential for sarcomere assembly in cardiomyocytes, its precise biochemical functions and contributions to actin dynamics remain poorly understood.

      In this study, the authors combine in vitro biochemical assays with cellular experiments to dissect FHOD3's roles in actin assembly and sarcomere formation. They demonstrate that FHOD3 nucleates actin filaments and acts as a transient elongator, pausing elongation after an initial burst of filament growth. Using separation-of-function mutants, they show that FHOD3's elongation activity - rather than its nucleation, capping, or bundling capabilities - is key for its sarcomeric function.

      The experiments have been conducted rigorously and well-analyzed, and the paper is clearly written. The data presented support the authors' conclusions. I appreciate the detailed description and rationale behind the FHOD3 constructs used in this study.

      However, I was somewhat surprised and a bit disappointed that while the authors conducted single-color TIRF experiments to observe the effects of FHOD3 on single filaments, they did not use fluorescently labeled FHOD3 to directly visualize its behavior. Incorporating such experiments would significantly strengthen their conclusions regarding FHOD3's bursts of elongation interspersed with capping activity. While I understand this might require a few additional weeks of experiments, these data would add considerable value by directly testing the proposed mechanism.

      There is a typo in the word "required" in line number 30. The authors also use fit data to extract parameters in several panels (e.g., Figures 2b, 2d, 3a, and 3b). While these fit functions may be intuitive to actin experts, explicitly describing the fit functions in the figure legends or methods would greatly benefit the broader readership.

    4. Reviewer #3 (Public review):

      Valencia et al. aim to elucidate the biochemical and cellular mechanisms through which the human formin FHOD3 drives sarcomere assembly in cardiomyocytes. To do so, they combined rigorous in vitro biochemical assays with comprehensive in vivo characterizations, evaluating two wild-type FHOD3 isoforms and two function-separating mutants. Surprisingly, they found that both wild-type FHOD3 isoforms can nucleate new actin filaments, as well as elongate existing actin filaments in conjunction with profilin following barbed-end capping. This is in addition to FHOD3's proposed role as an actin bundler. Next, the authors asked whether FHOD3L promotes sarcomere assembly in cardiomyocytes through its activity in actin nucleation or rather elongation. With two function-separating mutants, the authors evaluated the numbers and morphology of sarcomeres, as well as their ability to beat and generate cardiac rhythm. The authors found that while the wild-type FHOD3L and the K1193L mutant can rescue sarcomere morphology and physiology, the GS-FH1 mutant fails to do so. Given that in GS-FH1 mainly elongation activity is compromised, the authors concluded that the elongation activity of FHOD3 is essential for its role in sarcomere assembly in cardiomyocytes, while its nucleator activity is dispensable. Overall, this important study provided a broadened view on the biochemical activities of FHOD3, and a pioneering view on a possible cellular mechanism of how FHOD3L drives sarcomere assembly. If further validated, this can lead to new mechanistic models of sarcomere assembly and potentially new therapeutic targets of cardiomyopathy.

      The conclusions of this paper are mostly well supported by the comprehensive biochemical analyses performed by the authors. However, the sarcomere assembly defect phenotype in the GS-FH1 rescue condition requires further investigation, as the extremely low level of GS-FH1 signal in transfected cells in Figure 6A may reflect a failure of actin-binding by this construct in vivo, rather than its inability to drive elongation. Though the authors do show in Figure 6 that GS-FH1 can bind to normal-looking sarcomeres when they are present, this may be due to a lack of siRNA activity in these cells, such that endogenous FHOD3L is still present. In this possible scenario, GS-FH1 may dimerize with endogenous FHOD3L. The authors should demonstrate that GS-FH1 alone can indeed interact with existing actin filaments in vivo. While this has been clearly demonstrated in vitro, given the more complex biochemical environment in vivo where additional unknown binding partners may present, cautions should be made when extrapolating findings from the former to the latter.

    5. Author response:

      We thank the reviewers for their careful readings of our paper and their very positive assessment. Here we address the two major concerns they raised, referring to the revised version of the manuscript that will be submitted:

      (1) Important points were raised regarding the brief elongation events we reported. The time resolution and noise in our system reduce the accuracy of the burst velocity measurements. To address this, we have reached out to a colleague who is set up to repeat these measurements with microfluidics-assisted TIRF. The noise should be greatly reduced and the system is also optimal for directly visualizing labeled FHOD3, as suggested. We hope this experimental approach will provide new insights.

      In the meantime, we analyzed our data more closely. We were asked about the pauses we observe before bursts of elongation and how we know they are functionally relevant. The short answer is that we do not know. We reported them because they were so common:  in three independent experiments with wild type FHOD3L-CT we analyzed a total of 20 filaments. We detected 112 dim regions and 97 of these were pause/burst events (~87%). Among the cases lacking a pause we include instances of apparent "double bursts" with no time for capping in between (which may be a time resolution issue) and some cases where the burst was in progress when data collection started. In the latter case, we cannot determine whether or not a pause was missed. We cannot rule out that this pause reflects an interaction with the surface but might expect the frequency to be lower if it were. In fact, we did detect pauses in the profilin-actin negative control but only 4 pauses were detected across 21 filaments analyzed compared to 97 pauses observed in the presence of wild type FHOD3L across 20 filaments analyzed. We will revise the text to make our conclusions about pauses more circumspect.

      For comparison to our current data, we further analyzed the filaments in TIRF assays with no formin present. As the reviewers point out, inhomogeneities in filament intensity are normal. Thus, we examined any dim spots for pauses and/or bursts. We will report (future Figure 2G) that the velocity of growth of these dim spots was the same as the velocity of the rest of the filament. While our numbers may not be perfectly accurate due to the noise in our system, the difference of 3-4 fold increase versus no detectable change in rate is substantial and statistically different. In addition, we determined the number of dim spots per length of filament. We found a higher frequency of dim spots when FHOD3L-CT or FHOD3S-CT was present vs no formin, as will be shown in Figure 2 – figure supplement 1G and 2D.

      We are convinced that the brief dim events we observed in the presence of FHOD3L-CT do, in fact, reflect formin-mediated elongation and hope that the reviewers concur. This does not preclude our interest in the microfluidics and two-color assays, which we will pursue in the future.

      (2) The reviewers were concerned about the low protein levels in the GS-FH1 rescue experiments as reflected in the HA fluorescence intensity distributions shown in Fig. 5 – figure supplement 2A. While the scenario proposed could explain our observations with the GSFH1 rescues, it is quite complex and does not preclude the conclusion that the FH1 domain is critical. One limit of this scenario would be that the protein levels in the GS-FH1 cells reflect completely inactive protein, as opposed to FHOD3L that cannot elongate (by design). Given that the C-terminal half of the protein folds and functions and that the changes are made within an intrinsically disordered region, we do not favor this model. The reviewers suggest that the mutant protein detected in the few cells with (probably residual) sarcomeres could be stabilized, in part or entirely, by heterodimerization with residual endogenous wild type protein. We agree that heterodimerization is possible. The question becomes, how active is a heterodimer? If heterodimers have any activity, it seems far from sufficient to rescue sarcomere formation, suggesting that two functional FH1 domains are critical. To confirm this possibility, we would have to be able to determine whether the few sarcomeres present in these cases are residual and/or the new sarcomeres the low level of heterodimers could make. That said, we do not see evidence of correlation between protein levels and rescue at the level present in these cells (addressed below). Unfortunately, the proposed IP to test whether FHOD3L binds actin in vivo would only potentially report on filament side binding (both direct and indirect). It would not address whether the GS-FH1 mutant functions as a nucleator, elongator, bundler and/or capping protein in vivo.

      If we assume that the protein present is active, the critical question that we can address is whether the phenotype is due to low protein levels or if the phenotype is due to loss of elongation activity by FHOD3L. To address this question, we revisited our data.

      First, we plotted the distributions of the intensities of the cells we analyzed further, in addition to the automated readout of all the cells in the dish we originally presented (e.g. Fig. 4 – figure supplement 2A,B). These cells were selected randomly and, as should be the case, the distributions of their intensities agree well with the original distributions for the three different rescue constructs: FHOD3L, K1193L, and GS-FH1 (Fig. 6 – figure supplement 1A,B). We then asked whether there was any correlation between HA intensities with the sarcomere metrics. Consistent with in our pilot data, no correlation is evident in any of the three cases across the range of intensities we collected (400 – 2700 a.u.) (Fig. 6 – figure supplement 1C,D,E). We were originally satisfied with the GS-FH1 data, despite the low average intensity levels, because the intensities were well within the range that we established in pilot studies. These data reconfirm that the intensity levels are reasonable in a larger study.

      To more specifically address the question of whether low HA fluorescence intensity is likely to reflect sufficient protein levels to build sarcomeres, we re-examined two data sets from the FHOD3L WT rescue data. We found that, by chance, the first replicate of data from the wild type rescue has a comparable intensity distribution to that of the GSFH1 rescues (580 +/- 261 / cell vs. 548 +/- 105 / cell). In addition, we collected all of the data from cells with intensity levels <720, selected to mimic the distribution of the GS-FH1 cells (Fig. 6 – figure supplement 3A). We then compared the sarcomere metrics (sarcomere number, sarcomere length, sarcomere width) between the full data set and the two low intensity subsets using statistical tests as reported for the rest of the cell biology data set:

      · Sarcomere number is the only non-normal metric. We therefore used the Mann Whitney U test for each pairwise comparison, which shows no difference between all 3 WT distributions.

      · We compared Z-line lengths by Student’s two-sample, unpaired t-test for each pairwise comparison, again finding no significant difference for all distributions.

      · Sarcomere length shows a weakly significant difference (p=0.017 (compared to 0.033 for 3 treatment groups based on Bonferroni correction)) between the whole WT data set and bio rep 1, but no difference between the whole WT data set and the HA<720 group via Student’s two-sample, unpaired t-test.

      An alternate statistical analysis approach, one-way ANOVA and Tukey post hoc tests, gave similar results. Thus, cells expressing wild type FHOD3L at levels comparable to levels detected in GS-FH1 mutant rescues, are fully rescued. Based on these findings we conclude that the expression levels in the GS-FH1 are high enough to rescue the FHOD3 knock down, supporting our conclusion that the defect is due to loss of elongation activity. We will add this analysis and discussion to the revised manuscript.

      In future studies we will design less severe mutations to the FH1 domain. We hope to identify one with a strong effect on elongation and another with an intermediate effect. Once the best candidates are characterized in vitro, we will test them in our rescue experiments. If the strong mutant mimics the GS-FH1 rescue and the intermediate mutant is less severe, we will have strengthened our conclusion that elongation is a critical FHOD3L activity in sarcomere formation.

      Additional improvements will be made to the manuscript based on recommendations we received from the reviewers.

    1. eLife Assessment

      This important study investigates the role of vasopressin in modulating same-sex affiliative relationships in the context of linear dominance hierarchies. It provides convincing evidence that vasopressin signaling is involved in modulating aspects of affiliative behavior, although the evidence that affiliative relationships specifically arise from the triadic interaction study design is incomplete. Nevertheless, its focus on broadening the types of social relationships and species studied in this area makes it of interest to both neuroendocrinologists and colleagues studying the evolution and mechanisms underlying social affiliation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors seek to establish whether triadic interaction can promote affiliative relationships in the context of strict dominance hierarchies, and whether the vasopressinergic system is involved in such affiliations. To address this, they experimentally examine how male same-sex affiliations form by testing triadic cohabitation in large-billed crows, a species where males are known to develop and maintain same-sex affiliative relationships within a strict linear social hierarchy. They show a reduction in aggressive behavior over time with cohabitation and the formation of affiliative relationships, as measured by reciprocal allopreening, between two members (dyad) of the triad. The authors then administer a V1aR antagonist to each member of the triad, finding that allopreening decreases and dominance/submissive behaviors reemerge only in the dyad that developed an affiliated relationship ("affiliated dyad") with blockade of V1aR, demonstrating that V1aR mediates maintenance of affiliative peer relationships. The questions of how peer affiliations form, particularly in the context of dominance hierarchies, and the role of V1aR in regulating these behaviors are impactful for the field of social behavior. While the experimental paradigm provides a new way of approaching these questions, we have outlined below our concerns regarding the collection and interpretation of the data that limit the impact of this particular study.

      Strengths:

      (1) The authors develop a behavioral paradigm and experimental sequence using large-billed crows that allows them to identify the formation of stable, affiliated dyads within triadic groups that are robust to subsequent testing and are sensitive to pharmacological manipulation.

      (2) The effects of V1aR antagonism on allopreening and respective dominance or submissive behaviors appear significant and specific to the affiliated dyad, which supports the view that V1aR plays a role in context-dependent, flexible regulation of aggressive behaviors across species. However, these results are difficult to interpret with respect to the authors' main claims given the weaknesses outlined below.

      Weaknesses:

      (1) The authors claim that the data demonstrates that a triadic social group facilitates the formation of affiliative dyads and go further to claim that these relationships have relevance to understanding coalition formation. It is difficult to say whether the triadic structure actually facilitates or promotes the formation of these affiliative interactions as stated without direct comparisons to alternately sized groupings. Further, the relevance to coalitions is weak without expanded behavioral testing.

      (2) Aspects of the experimental design introduce confounding factors that make it difficult to interpret the resulting data. In experiment 1, 6 of the 18 animals that are used for testing are part of multiple triads. This is not accounted for in either the experimental design (wash-out period prior to reuse of animals) or statistical analysis (including repeated testing as a factor in the model) or is not described. Further, while the authors do randomize and counterbalance the two dose trials for the antagonist, vehicle vs drug exposure is not randomized.

      (3) The re-emergence of dominance-related agonistic behaviors with V1aR antagonism specifically in the affiliated dyads is interesting, but difficult to interpret without further description and analysis of the dyadic behavior, particularly given the absence of dominance-related behaviors in either affiliated or unaffiliated dyads during the cohabitation period. In addition, the current data does not support the hypothesis that V1aR is also required to form affiliative relationships, as stated in the discussion (Lines 464-5, 472, 494), since the authors did not administer V1aR antagonist during the initial period of triadic cohabitation.

      (4) Sentences are often repetitive or duplicated (lines 424-426), and paragraphs should be condensed for easier reading, especially in the discussion. Further, some of the discussion might be better presented in an "Ideas and Speculation" subsection, which would help readers appropriately assess the validity of the conclusions based on the data vs the larger implications suggested by the authors.

    3. Reviewer #2 (Public review):

      Seguchi and Izawa provide robust evidence for the role of vasopressin in modulating same-sex affiliative relationships. Especially striking is that these effects appear to be selective to key relationships within a triadic social context. Overall, this is an interesting and rich dataset with compelling results. I largely have some clarifying questions.

      Experiment 1 Comments:

      (1) The primary argument/finding in this experiment is that a triadic situation/environment facilitates the development of male-male reciprocal social relationships. Overall, this effect appears striking in that male-male affiliative bonds (defined as reciprocal allopreening) formed in 6 of the 8 triads tested. However, there is no comparison group of dyads to determine whether co-housing for 2 weeks could also support the formation of male-male social bonds. This lack of a comparison group makes it unclear to what extent the triad is the key aspect of the environment that supports social bonding.

      (2) More specifically, the authors argue that it is not just that triads support affiliative male-male bonds, but that bonds form between the second "middle" (dominant/subordinate) and third "low" (subordinate/subordinate) individuals in each triad. However, it was difficult to assess this from the results.<br /> a) For example, in Figure 3B is each data point the average of two individuals, since in each triad there are two dominant and two subordinate individuals?<br /> b) For me, using more precise language beyond dominant and subordinate (e.g. middle and low), and more clearly displaying the results of allopreening for each pairwise dyad within a triad would improve the impact of the results and support the authors' argument.

      (3) Experiment 2 Comments:<br /> The results here are quite striking, despite the low sample size. In Figure 4, it appears that in every instance of administration V1aRA low and high administration decreased allopreening for both dominant and subordinate individuals.

      (4) Some methodological questions:<br /> a) Can you clarify whether the duration of the post-test was also 30 min?<br /> b) As in Experiment 1, how are individual birds represented in the triad? Was the second "Middle" bird (dominant/subordinate) tested as both a dominant and subordinate bird? My understanding is that the dominant and subordinate birds in Figure 4 are different individuals but that they are the same individuals represented between the affiliated dyad and unaffiliated dyad.

      (5) Throughout the manuscript (Lines 57-67; 557-566) the authors argue that the role of VP in regulating gregariousness can be extrapolated to understand the role of same-sex affiliative bonding. Importantly, gregariousness does not necessarily reflect affiliative bonding. While allopreening is specifically associated with social bonding (e.g. monogamous pair bonds) independent of broader social systems, gregariousness in general, and specifically as defined in many of the references cited, is independent of social bonds - in fact, it is assessed primarily in novel social contexts.

      (6) To clarify, adult prairie voles in the wild do not engage in same-sex affiliative behavior commonly. In fact one of the primary components of opposite-sex pair bonding is same-sex aggression. Thus, while mechanistic studies on the neurobiology of same-sex peer bonds are relevant for this work, I am less convinced that you can make comparisons between the ultimate function of same-sex affiliative relationships in prairie voles.

      (17) The results here are consistent with VP having an anxiolytic effect, as has been suggested in birds, with the consequences on social behaviors being directly or indirectly related. This may be a useful point to draw on in the discussion when considering your findings.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Seguchi & Izawa investigate the formation of male-male affiliative relationships within triads of large-billed crows. They then administered a vasopressin 1a receptor (V1aR) antagonist to either the dominant or subordinate individual within affiliative dyads, to examine whether blocking V1aR disrupts affiliative behavior. They discovered that affiliative dyads can be induced in large-billed crows by housing them in triads. They also found that blocking V1aRs significantly decreased allopreening (an affiliative behavior) within dyads. In addition, it increased aggression by dominant individuals and submissive calls by subordinate individuals.

      Strengths:

      This manuscript uses an especially interesting species - a highly intelligent and highly social corvid, with complex dominance hierarchies - to extend previous work into the effects of the oxytocin and vasopressin peptides hormones on social behaviors. The results are surprisingly clear, despite a small sample size. The authors use the correct statistical approaches to account for a complex, nested design. The introduction and discussion both reflect a strong understanding of the relevant literature, including the limitations of extrapolating from peripheral (intramuscular) versus central (into the brain) injections of the V1aR antagonist. In addition, the authors appear to have been transparent about the data and results, accounting for some of the challenges and limitations of the data and study.

      Weaknesses:

      There are two major concerns. First, the study has a very low sample size (8 triads for Experiment 1, and only 5 triads for Experiment 2). Despite the surprisingly convincing findings, the sample size is too small to support the claim that the vasopressin system "universally mediates same sex relationships. Secondly, the study does not account for the effects of V1aR on non-social behaviors. This is especially true because vasopressin/V1aR (and the particular antagonist used in this study) is known to have effects on osmotic balance, food intake, and stress, including in birds. My concern is that the behavioral effects could be accounted before by differences in general stress or activity levels. Allopreening is usually an activity performed in periods of relative inactivity with aggression being more characterized by high activity levels. The authors discuss these different effects of vasopressin/V1aR in the Discussion, but they do not account for these effects in the study design.

    1. eLife Assessment

      In their valuable study, Lee et al. explore a role for the Hippo signaling pathway, specifically wts-1/LATS and the downstream regulator yap, in age-dependent neurodegeneration and microtubule dynamics using C. elegans mechanosensory neurons as a model. The authors demonstrate that disruption of wts-1/LATS leads to age-associated morphological and functional neuronal abnormalities, linked to enhanced microtubule stabilization, and shows a genetic connection between yap and microtubule stability. Despite some mechanistic gaps, the study employs robust genetic and molecular approaches to reveal a convincing link between the Hippo pathway, microtubule dynamics, and neurodegeneration.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of microtubule dynamics and its effects on neuronal aging. Using C. elegans as a model, the authors investigate the role of evolutionarily conserved Hippo pathway in microtubule dynamics of touch receptor neurons (TRNs) in an age-dependent manner. Using genetic, molecular, behavioral, and pharmacological approaches, the authors show that age-dependent loss of microtubule dynamics might underlie structural and functional aging of TRNs. Further, the authors show that the Hippo pathway specifically functions in these neurons to regulate microtubule dynamics. Specifically, authors show that hyperactivation of YAP-1, a downstream component of the Hippo pathway that is usually inhibited by the kinase activity of the upstream components of the pathway, results in microtubule stabilization and that might underlie the structural and functional decline of TRNs with age. However, how the Hippo pathway regulates microtubule dynamics and neuronal aging was not investigated by the authors.

      Strengths:

      This is a well-conducted and well-controlled study, and the authors have used multiple approaches to address different questions.

      Weaknesses:

      There are no major weaknesses identified, except that the effect of the Hippo pathway seems to be specific to only a subset of neurons. I would like the authors to address the specificity of the effect of the Hippo pathway in TRNs, in their resubmission.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons.

      Strengths:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons. Strong pharmacological and especially genetic manipulations of MT-stabilizing or severing proteins show a strong genetic link between yap and regulation of MTs stability. The study is strong and uses robust approaches, especially strong genetics. The demonstrations on the aging-related roles of the Hpo signaling pathway, and the link to MTs, are novel and compelling. Nevertheless, the study also has mechanistic weaknesses (see below).

      Weaknesses:

      Specific comments:

      (1) The study demonstrates age-specific roles of the Hpo pathway, specifically of wts-1/LATS and yap, specifically in TRN mechanosensory neurons, without observing developmental defects in these neurons, or effects in other neurons. This is a strong demonstration. Nevertheless, the study does not address whether there is a correlation of Hpo signaling pathway activity decline specifically in these neurons, and not other neurons, and at the observed L4 stage and onwards (including the first day of adulthood, 1DA stage). Such demonstrations of spatio-temporal regulation of the Hpo signaling pathway and its activation seem important for linking the Hpo pathway with the observed age-related neurodegeneration. Can this age-related response be correlated to indeed a decline in Hpo signaling during adulthood? Especially at L4 and onwards? It will be informative to measure this by examining the decline in wts1 as well as yap levels and yap nuclear localization.

      (2) The Hpo pathway eventually activates gene expression via yap. Although the study uses robust genetic manipulations of yap and wts-1/LATS, it is not clear whether the observed effects are attributed to yap-mediated regulation of gene expression (see 3).

      (3) The observations on the abnormal MT stabilization, and the subsequent genetic examinations of MT-stability/severing genes, are a significant strength of the study. Nevertheless, despite the strong genetic links to yap and wts-1/LATS, it is not clear whether MT-regulatory genes are regulated by transcription downstream of the Hpo pathway, thus not enabling a strong causal link between MT regulation and Hpo-mediated gene expression, making this strong part of the study mechanistically circumstantial. Specifically, it will be good to examine whether the genes addressed herein, for example, Spastin, are transcriptionally regulated downstream of the Hpo pathway. This comment is augmented by the finding that in the wts-1/ yap-1 double mutants, MT abnormality, and subsequent neuronal morphology and touch responses are restored, clearly indicating that there is an associated transcriptional regulation

      Other comments:

      (1) The TRN-specific knockdown of wts-1 and yap-1 is a clear strength. Nevertheless, these do not necessarily show cell-autonomous effects, as the yap transcription factor may regulate the expression of external cues, secreted or otherwise, thus generating non-cell autonomous effects. For example, it is known that yap regulates TGF-beat expression and signaling.

      (2) Continuing from comment (3) above, it seems that many of the MT-regulators chosen here for genetic examinations were chosen based on demonstrated roles in neurodegeneration in other studies. It would be good to show whether these MT-associated genes are directly regulated by transcription by the Hpo pathway.

      (3) The impairment of the touch response may not be robust: it is only a 30-40% reduction at L4, and even less reduction at 1DA. It would be good to offer possible explanations for this finding.

    1. eLife Assessment

      This important work advances our understanding of parabrachial CGRP threat function. The evidence supporting CGRP aversive outcome signaling is solid, while the evidence for cue signaling and fear behavior generation is incomplete. The work will be of interest to neuroscientists studying defensive behaviors.

    2. Reviewer #1 (Public Review):

      Summary

      The authors asked if parabrachial CGRP neurons were only necessary for a threat alarm to promote freezing or were necessary for a threat alarm to promote a wider range of defensive behaviors, most prominently flight.

      Major Strengths of Methods and Results

      The authors performed careful single-unit recording and applied rigorous methodologies to optogenetically tag CGRP neurons within the PBN. Careful analyses show that single-units and the wider CGRP neuron population increases firing to a range of unconditioned stimuli. The optogenetic stimulation of experiment 2 was comparatively simpler but achieved its aim of determining the consequence of activating CGRP neurons in the absence of other stimuli. Experiment 3 used a very clever behavioral approach to reveal a setting in which both cue-evoked freezing and flight could be observed. This was done by having the unconditioned stimulus be a "robot" traveling along a circular path at a given speed. Subsequent cue presentation elicited mild flight in controls and optogenetic activation of CGRP neurons significantly boosted this flight response. This demonstrated for the first time that CGRP neuron activation does more than promote freezing. The authors conclude by demonstrating that bidirectional modulation of CGRP neuron activity bidirectionally affects freezing in a traditional fear conditioning setting and affects both freezing and flight in a setting in which the robot served as the unconditioned stimulus. Altogether, this is a very strong set of experiments that greatly expand the role of parabrachial CGRP neurons in threat alarm.

      Weaknesses

      In all of their conditioning studies the authors did not include a control cue. For example, a sound presented the same number of times but unrelated to US (shock or robot) presentation. This does not detract from their behavioral findings. However, it means the authors do not know if the observed behavior is a consequence of pairing. Or is a behavior that would be observed to any cue played in the setting? This is particularly important for the experiments using the robot US.

      The authors make claims about the contribution of CGRP neurons to freezing and fleeing behavior, however, all of the optogenetic manipulations are centered on the US presentation period. Presently, the experiments show a role for these neurons in processing aversive outcomes but show little role for these neurons in cue responding or behavior organizing. Claims of contributions to behavior should be substantiated by manipulations targeting the cue period.

      Appraisal

      The authors achieved their aims and have revealed a much greater role for parabrachial CGRP neurons in threat alarm.

      Discussion

      Understanding neural circuits for threat requires us (as a field) to examine diverse threat settings and behavioral outcomes. A commendable and rigorous aspect of this manuscript was the authors decision to use a new behavioral paradigm and measure multiple behavioral outcomes. Indeed, this manuscript would not have been nearly as impactful had they not done that. This novel behavior was combined with excellent recording and optogenetic manipulations - a standard the field should aspire to. Studies like this are the only way that we as a field will map complete neural circuits for threat.

    3. Reviewer #2 (Public Review):

      -Summary of the Authors' Aims:<br /> The authors aimed to investigate the role of calcitonin gene-related peptide (CGRP) neurons in the parabrachial nucleus (PBN) in modulating defensive behaviors in response to threats. They sought to determine whether these neurons, previously shown to be involved in passive freezing behavior, also play a role in active defensive behaviors, such as fleeing, when faced with imminent threats.

      -Major Strengths and Weaknesses of the Methods and Results:<br /> The authors utilized an innovative approach by employing a predator-like robot to create a naturalistic threat scenario. This method allowed for a detailed observation of both passive and active defensive behaviors in mice. The combination of electrophysiology, optogenetics, and behavioral analysis provided a comprehensive examination of CGRP neuron activity and its influence on defensive behaviors. The study's strengths lie in its robust methodology, clear results, and the multi-faceted approach that enhances the validity of the findings.

      No notable weakness found.

      -Appraisal of Aims and Results:<br /> The authors successfully achieved their aims by demonstrating that CGRP neurons in the PBN modulate both passive and active defensive behaviors. The results clearly show that activation of these neurons enhances fear memory and promotes conditioned fleeing behavior, while inhibition reduces these responses. The study provides strong evidence supporting the hypothesis that CGRP neurons act as a comprehensive alarm system in the brain.

      -Impact on the Field and Utility of Methods and Data:<br /> This work has significant implications for the field of neuroscience, particularly in understanding the neural mechanisms underlying adaptive defensive behaviors. The innovative use of a predator-like robot to simulate naturalistic threats adds ecological validity to the findings and may inspire future studies to adopt similar approaches. The comprehensive analysis of CGRP neuron activity and its role in defensive behaviors provides valuable data that could be useful for researchers studying fear conditioning, neural circuitry, and behavior modulation.

      -Additional Context:<br /> The study builds on previous research that primarily focused on the role of CGRP neurons in passive defensive responses, such as freezing. By extending this research to include active responses, the authors have provided a more complete picture of the role of these neurons in threat detection and response. The findings highlight the versatility of CGRP neurons in modulating different types of defensive behaviors based on the perceived intensity and immediacy of threats.

      Overall, this manuscript makes a significant contribution to our understanding of the neural basis of defensive behaviors and offers valuable methodological insights for future research in the field.

    4. Reviewer #3 (Public Review):

      Strengths:<br /> The study used optogenetics together with in vivo electrophysiology to monitor CGRP neuron activity in response to various aversive stimuli including robot chasing to determine whether they encode noxious stimuli differentially. The study used an interesting conditioning paradigm to investigate the role of CGRP neurons in the PBN in both freezing and flight behaviors.

      Weakness:<br /> The major weakness of this study is that the chasing robot threat conditioning model elicits weak unconditioned and conditioned flight responses, making it difficult to interpret the robustness of the findings. Furthermore, the conclusion that the CGRP neurons are capable of inducing flight is not substantiated by the data. No manipulations are made to influence the flight behavior of the mouse. Instead, the manipulations are designed to alter the intensity of the unconditioned stimulus.

    1. eLife Assessment

      This study presents a valuable advancement in antiviral research by applying SHAPE-Map to analyze the secondary structure of the Porcine Epidemic Diarrhoea Virus RNA genome in infected cells, identifying promising therapeutic targets within viral genomic RNA. The authors provide convincing evidence of potential antiviral targetable RNA regions through a wide array of data from different methods, supported by well-documented experimental design and data analysis, demonstrating how RNA structural probing can effectively discover RNA targets and enabling further discoveries in the field. The work will be of interest to researchers focused on RNA therapeutics and viral genome studies.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the potential of targeting specific regions within the RNA genome of the Porcine Epidemic Diarrhea Virus (PEDV) for antiviral drug development. The authors used SHAPE-MaP to analyze the structure of the PEDV RNA genome in infected cells. They categorized different regions of the genome based on their structural characteristics, focusing on those that might be good targets for drugs or small interfering RNAs (siRNAs).

      They found that dynamic single-stranded regions can be stabilized by compounds (e.g., to form G-quadruplexes), which inhibit viral proliferation. They demonstrated this by targeting a specific G4-forming sequence with a compound called Braco-19. The authors also describe stable (structured) single-stranded regions that they used to design siRNAs showing that they effectively inhibited viral replication.

      Strengths:

      There are a number of strengths to highlight in this manuscript.

      (1) The study uses a sophisticated technique (SHAPE-MaP) to analyze the PEDV RNA genome in situ, providing valuable insights into its structural features.

      (2) The authors provide a strong rationale for targeting specific RNA structures for antiviral development.

      (3) The study includes a range of experiments, including structural analysis, compound screening, siRNA design, and viral proliferation assays, to support their conclusions.

      (4) Finally, the findings have potential implications for the development of new antiviral therapies against PEDV and other RNA viruses.

      Overall, this interesting study highlights the importance of considering RNA structure when designing antiviral therapies and provides a compelling strategy for identifying promising RNA targets in viral genomes.

      Weaknesses:

      I have some concerns about the utility of the 3D analyses, the effects of their synonymous mutants on expression/proliferation, a potentially missed control for studies of mutants, and the therapeutic utility of the compound they tested vs. G-quadruplexes.

    3. Reviewer #2 (Public review):

      Summary:

      Luo et. al. use SHAPE-MaP to find suitable RNA targets in Porcine Epidemic Diarrhoea Virus. Results show that dynamic and transient structures are good targets for small molecules, and that exposed strand regions are adequate targets for siRNA. This work is important to segment the RNA targeting.

      Strengths:

      This work is well done and the data supports its findings and conclusions. When possible, more than one technique was used to confirm some of the findings.

      Weaknesses:

      The study uses a cell line that is not porcine (not the natural target of the virus).

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Luo et al. applied SHAPE-Map to analyze the secondary structure of the Porcine Epidemic Diarrhoea Virus (PEDV) RNA genome in infected cells. By combining SHAPE reactivity and Shannon entropy, the study indicated that the folding of the PEDV genomic RNA was nonuniform, with the 5' and 3' untranslated regions being more compactly structured, which revealed potentially antiviral targetable RNA regions. Interestingly, the study also suggested that compounds bound to well-folded RNA structures in vitro did not necessarily exhibit antiviral activity in cells, because the binding of these compounds did not necessarily alter the functions of the well-folded RNA regions. Later in the manuscript, the authors focus on guanine-rich regions, which may form G-quadruplexes and be potential targets for small interfering RNA (siRNA). The manuscript shows the binding effect of Braco-19 (a G-quadruplex-binding ligand) to a predicted G4 region in vitro, along with the inhibition of PEDV proliferation in cells. This suggests that targeting high SHAPE-high Shannon G4 regions could be a promising approach against RNA viruses. Lastly, the manuscript identifies 73 single-stranded regions with high SHAPE and low Shannon entropy, which demonstrated high success in antiviral siRNA targeting.

      Strengths:

      The paper presents valuable data for the community. Additionally, the experimental design and data analysis are well documented.

      Weakness:

      The manuscript presents the effect of Braco-19 on PQS1, a single G4 region with high SHAPE and high Shannon entropy, to suggest that "the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells" (lines 625-626). While the effect of Braco-19 on PQS1 is supported by strong evidence in the manuscript, the conclusion regarding the G4 region with high SHAPE and high Shannon entropy is based on a single target, PQS1.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes," Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. By inducing oxidative damage, they block dsRNA import into cells, which affects heritable gene regulation in the adult germline (Fig. 2). They identify a novel gene, sid-1-dependent gene-1 (sdg-1), upregulated upon SID-1 inhibition (Fig. 3). Both transient and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). Interestingly, while sdg-1 expression suggests a potential role in dsRNA transport, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing in the germline (Fig. 7).

      Strengths:

      • The authors employ a robust neuronal stress model to systematically explore SID-1 dependent intergenerational dsRNA transport in C. elegans.

      • They discover two novel SID-1-dependent genes, sdg-1 and sdg-2.

      • The manuscript is well-written and addresses the compelling topic of dsRNA signaling in C. elegans.

      Weaknesses:

      • The molecular mechanism downstream of SDG-1 remains unclear. Testing whether sdg-2 functions redundantly with sdg-1could provide further insights.

      • SDG-1 dependent genes in other nematodes remain unknown.

      We thank the reviewer for highlighting the strengths of the work along with a couple of the interesting future directions inspired by the reported discoveries. The restricted presence of genes encoding SDG-1 and its paralogs within retrotransposons suggests intriguing evolutionary roles for these proteins. Future work could examine whether such fast-evolving or newly evolved proteins with potential roles in RNA regulation are more broadly associated with retrotransposons. Multiple SID-1-dependent proteins (including SDG-1 and SDG-2) could act together to mediate downstream effects. This possibility can be tested using combinatorial knockouts and overexpression strains. Both future directions have the potential to illuminate the evolutionarily selected roles of dsRNA-mediated signaling through SID-1, which remain a mystery.

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double-stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double-stranded target RNA is observed to repress target gene expression in the germline. Exogenous short or long double-stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with germ granules, intracellular sites for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      Strengths:

      (1) Very cleaver molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      Weaknesses:

      (1) The manuscript is broad, making it challenging to read and consider the data presented. Of note, since the original submission, the authors have improved the clarity of the writing and presentation.

      Comments on revised version:

      This reviewer thanks the authors for their efforts in revising the manuscript. In their rebuttal, the authors acknowledged the broad scope of their manuscript. I concur. While I still think the manuscript is a challenge to read due to its expansive nature, the current draft is substantially improved when compared to the previous one. This work will contribute to our general knowledge of RNA biology, small RNA regulatory pathways, and RNA inheritance.

      We thank the reviewer for highlighting the strengths of the manuscript and for helping us improve the presentation of our results and discussion.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes" Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. They induce oxidative damage in worms, blocking dsRNA import into cells (and potentially affecting the worms in other ways). Oxidative stress inhibits dsRNA import and the associated heritable regulation of gene expression in the adult germline (Fig. 2). The authors identify a novel gene, sid-1-dependent gene-1 (sdg-1), which is induced upon inhibition of SID-1 (Fig. 3). Both transient inhibition and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). The expression of SDG-1 is variable, potentially indicating buffering regulation. While the expression of Sdg-1 could be consistent with a role in intergenerational transport of dsRNA, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing (Fig. 7) in the germline. It would be interesting to test if sdg-2 functions redundantly.

      In summary, the authors have identified a novel worm-specific protein (sdg-1) that is induced upon loss of dsRNA import via SID-1, but is not required to mediate SID-1 RNA regulatory effects.

      We thank the reviewer for highlighting our findings on SDG-1. We found that oxidative damage in neurons enhanced dsRNA transport into the germline and/or subsequent silencing.

      Remaining Questions:

      • The authors use an experimental system that induces oxidative damage specifically in neurons to release dsRNAs into the circulation. Would the same effect be observed if oxidative damage were induced in other cell types?

      It is possible that oxidative damage of other tissues using miniSOG (as demonstrated in Xu and Chisholm, 2016) could also enhance the release of dsRNA into the circulation from those tissues. However, future experiments would be needed to test this empirically because it is also possible that the release of dsRNA depends on physiological properties (e.g., the molecular machinery promoting specific secretion) that are particularly active in neurons. We chose to use neurons as the source of dsRNA because by expressing dsRNA in a variety of tissues, neurons appeared to be the most efficient at the export of dsRNA as measured using SID-1-dependent silencing in other tissues (Jose et al., PNAS, 2009).

      • Besides dsRNA, which other RNAs and cellular products (macromolecules and small signalling molecules) are released into the circulation that could affect the observed changes in germ cells?

      We do not yet know all the factors that could be released either in naive animals or upon oxidative damage of neurons that influence the uptake of dsRNA into other tissues. The dependence on SID-1 for the observed enhancement of silencing (Fig. 2) shows that dsRNA is necessary for silencing within the germline. Whether this import of dsRNA occurs in conjunction with other factors (e.g., the uptake of short dsRNA along with yolk into oocytes (Marré et al., PNAS, 2016)) before silencing within the germline will require further study. A possible approach could be the isolation of extracellular fluid (Banse and Hunter, J Vis Exp., 2012) followed by characterization of its contents. However, the limited material available using this approach and the difficulty in avoiding contamination from cellular damage by the needle used for isolating the material make it challenging.

      • SID-1 modifies RNA regulation within the germline (Fig. 7) and upregulates sdg-1 and sdg-2 (Fig. 5). However, SID-1's effects do not appear to be mediated via sdg-1. Testing the role of sdg-2 would be intriguing.

      We observe the accumulation of sdg-1 and sdg-2 RNA in two different mutants lacking SID-1, which led us to conservatively focus on the analysis of one of these proteins for this initial paper. We expect that more sensitive analyses of the RNA-seq data will likely reveal additional genes regulated by SID-1. With the ability to perform multiplexed genome-editing, we hope in future work to generate strains that have mutations in many SID-1-dependent genes to recapitulate the defects observed in sid-1(-) animals. Indeed, as surmised by the reviewer, we are focusing on sdg-2 as the first such SID-1-dependent gene to analyze using mutant combinations.

      • Are sdg-1 or sdg-2 conserved in other nematodes or potentially in other species?  appears to be encoded or captured by a retro-element in the C. elegans genome and exhibits stochastic expression in different isolates. Is this a recent adaptation in the C. elegans genome, or is it present in other nematodes? Does loss-of-function of sdg-1 or sdg-2 have any observable effect?

      Clear homologs of SDG-1 and SDG-2 are not detectable outside of C. elegans. Consistent with the location of the sdg-1 gene within a Cer9 retrotransposon that appears to have integrated only within the C. elegans genome, sequence conservation between the genomes of related species is only observed outside the region of the retrotransposon (see Author response image 1, screenshot from UCSC browser). There were no obvious defects detected in animals lacking sdg-1 (Fig. 7) or in animals lacking sdg-2 (data not shown). It is possible that further exploration of both mutants and mutant combinations lacking additional SID-1-dependent genes would reveal defects. We also plan to examine these mutants in sensitized genetic backgrounds where one or more members of the RNA silencing pathway have been compromised.

      Author response image 1.

      Clarification for Readability:

      To enhance readability and avoid misunderstandings, it is crucial to specify the model organism and its specific dsRNA pathways that are not conserved in vertebrates:

      We agree with the reviewer and thank the reviewer for the specific suggestions provided below. To take the spirit of the suggestion to heart we have instead changed the title of our paper to clearly signal that the entire study only uses C. elegans. We have titled the study ‘Intergenerational transport of double-stranded RNA in C. elegans can limit heritable epigenetic changes’

      • In the first sentence of the paragraph "Here, we dissect the intergenerational transport of extracellular dsRNA ...", the authors should specify "in the nematode C. elegans". Unlike vertebrates, which recognise dsRNA as a foreign threat, worms and other invertebrates pervasively use dsRNA for signalling. Additionally, worms, unlike vertebrates and insects, encode RNA-dependent RNA polymerases that generate dsRNA from ssRNA substrates, enabling amplification of small RNA production. Especially in dsRNA biology, specifying the model organism is essential to avoid confusion about potential effects in humans.

      We agree with most statements made by the reviewer, although whether dsRNA is exclusively recognized as a foreign threat by all vertebrates of all stages remains controversial. Our changed title now eliminates all ambiguity regarding the organism used in the study.

      • Similarly, the authors should specify "in C. elegans" in the sentence "Therefore, we propose that the import of extracellular dsRNA into the germline tunes intracellular pathways that cause heritable RNA silencing." This is important because C. elegans small RNA pathways differ significantly from those in other organisms, particularly in the PIWI-interacting RNA (piRNA) pathways, which depend on dsRNA in C. elegans but uses ssRNA in vertebrates. Specification is crucial to prevent misinterpretation by the reader. It is well understood that mechanisms of transgenerational inheritance that operate in nematodes or plants are not conserved in mammals.

      The piRNAs of C. elegans are single-stranded but are encoded by numerous independent genes throughout the genome. The molecules used for transgenerational inheritance of epigenetic changes that have been identified thus far are indeed different in different organisms. However, the regulatory principles required for transgenerational inheritance are general (Jose, eLife, 2024). Nevertheless, we have modified the title to clearly state that the entire study is using C. elegans.  

      • The first sentence of the discussion, "Our analyses suggest a model for ...", would also benefit from specifying "in C. elegans". The same applies to the figure captions. Clarification of the model organism should be added to the first sentence, especially in Figure 1.

      With the clarification of the organism used in the title, we expect that all readers will be able to unambiguously interpret our results and the contexts where they apply. 

      Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double stranded target RNA is observed to repress target gene expression in the germline. Exogenous supply of short or long double stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to also be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with a Z-granule marker, an intracellular site for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      We thank the reviewer for highlighting our findings and underscoring the striking nature of the discovery that mutating sid-1 using genome-editing resulted in a transgenerational change that could not be reversed by changing the sid-1 sequence back to wild-type.

      Strengths:

      (1) Very clever molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      We thank the reviewer for the positive comments.

      Weaknesses:

      (1) As presented, the manuscript is incredibly broad, making it challenging to read and consider the data presented. This concern is exemplified in the model figure, that requires two diagrams to summarize the claims made by the manuscript.

      RNA interference (RNAi) by dsRNA is an organismal response where the delivery of dsRNA into the cytosol of some cell precedes the processing and ultimate silencing of the target gene within that cell. These two major steps are often not separately considered when explaining observations. Yet, the interpretation of every RNAi experiment is affected by both steps. To make the details that we have revealed in this work for both steps clearer, we presented the two models separated by scale - organismal vs. intracellular. We agree that this integrative manuscript appears very broad when the many different findings are each considered separately. The overall model revealed here forms the necessary foundation for the deep analysis of individual aspects in the future.

      (2) The large scope of the manuscript denies space to further probe some of the ideas proposed. The first part of the manuscript, particularly Figures 1 and 2, presents data that can be caused by multiple mechanisms, some of which the authors describe in the results but do not test further. Thus, portions of the results text come across as claims that are not supported by the data presented.

      We agree that one of the consequences of addressing the joint roles of transport and subsequent silencing during RNAi is that the scope of the manuscript appears large. We had suggested multiple interpretations for specific observations in keeping with the need for further work. To avoid any misunderstandings that our listing of possible interpretations be taken as claims by the reader, we have followed the instructions of the reviewer (see below) and moved some of the potential explanations we raised to the discussion section.

      (3) The manuscript focuses on the genetics of SDGs but not the proteins themselves. Few descriptions of the SDGs functions are provided nor is it clarified why only SDG-1 was pursued in imaging and genetic experiments. Additionally, the SDG-1 imaging experiments could use additional localization controls.

      We agree that more work on the SDG proteins will likely be informative, but are beyond the scope of this already expansive paper.  We began with the analysis of SDG-1 because it had the most support as a regulator of RNA silencing (Fig. 5f). Indeed, in other work (Lalit and Jose, bioRxiv, 2024), we find that AlphaFold 2 predicts the SDG-1 protein to be a regulator of RNA silencing that directly interacts with the dsRNA-editing enzyme ADR-2 and the endonuclease RDE-8. Furthermore, we expect that more sensitive analyses of the RNA-seq data are likely to reveal additional genes regulated by SID-1. Using multiplexed genome editing, we hope to generate mutant combinations lacking multiple sdg genes to reveal their function(s).

      We agree that given the recent discovery of many components of germ granules, our imaging data does not have sufficient resolution to discriminate between them. We have modified our statements and our model regarding the colocalization of SDG-1 with Z-granules to indicate that the overlapping enrichment of SDG-1 and ZNFX-1 in the perinuclear region is consistent with interactions with other nearby granule components.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) As presented, the manuscript is almost two manuscripts combined into one. This point is highlighted in Figure 7h, which basically presents two separate models. The key questions addressed in the manuscript starts at Figure 3. Figures 1 and 2 are interesting observations but require more experiments to define further. For example, as the Results text describes for Figure 1, "These differences in the entry of ingested dsRNA into cells and/or subsequent silencing could be driven by a variety of changes during development. These include changes in the uptake of dsRNA into the intestine, distribution of dsRNA to other tissues from the intestine, import of dsRNA into the germline, and availability of RNA silencing factors within the germline." Presenting these (reasonable) mechanistic ideas detracted from the heritable RNA epigenetic mechanism explored in the later portion of the manuscript. There are many ways to address this issue, one being moving Figures 1 and 2 to the Supplement to focus on SID-1 related pathways.

      Since this manuscript addresses the interaction between intercellular transport of dsRNA and heritable epigenetic changes, it was necessary to establish the possible route(s) that dsRNA could take to the germline before any inference could be made regarding heritable epigenetic changes. As suggested below (pt. 2), we have now moved the alternatives we enumerated as possible explanations for some experimental results (e.g., for the differences quoted here) to the discussion section.

      (2) The manuscript includes detailed potential interpretations in the Results, making them seem like claims. Here is an example:

      "Thus, one possibility suggested by these observations is that reduction of sdg-1 RNA via SID-1 alters the amount of SDG-1 protein, which could interact with components of germ granules to mediate RNA regulation within the germline of wild-type animals."

      This mechanism is a possibility, but placing these ideas in the citable results makes it seem like an overinterpretation of imaging data. This text and others should be in the Discussion, where speculation is encouraged. Results sections like this example and others should be moved to the discussion.

      We have rephrased motivating connections between experiments like the one quoted above and also moved such text to the discussion section wherever possible.

      (3) A paragraph describing the SDG proteins will be helpful. Homologs? Conserved protein domains? mRNA and/or protein expression pattern across worm, not just the germline? Conservation across Caenorhabditis sp? These descriptions may help establish context why SDG-1 localizes to Z-granules.

      We have now added information about the conservation of the sdg-1 gene in the manuscript. AlphaFold predicts domains with low confidence for the SDG-1 protein, consistent with the lack of conservation of this protein (AlphaFold requires multiple sequence alignments to predict confidently). In the adult animal, the SDG-1 protein was only detectable in the germline. Future work focused on SDG-1, SDG-2 and other SDG proteins will further examine possible expression in other tissues and functional domains if any. Unfortunately, in multiple attempts of single-molecule FISH experiments using probes against the sdg-1 open reading frame, we were unable to detect a specific signal above background (data not shown). Additional experiments are needed for the sensitive detection of sdg-1 expression outside the germline, if any.  

      (4) Based on the images shown, SDG-1 could be in other nearby granules, such as P granules or mutator foci. Additional imaging controls to rule out these granules/condensates will greatly strengthen the argument that SDG-1 protein localizes to Z-granules specifically.

      We have modified the final model to indicate that the perinuclear colocalization is with germ granules broadly and we agree that we do not have the resolution to claim that the observed overlap of SDG-1::mCherry with GFP::ZNFX-1 that we detect using Airyscan microscopy is specifically with Z granules. Our initial emphasis of Z-granule was based on the prior report of SDG-1 being co-immunoprecipitated with the Z-granule surface protein PID-2/ZSP-1. However, through other work predicting possible direct interactions using AlphaFold (Lalit and Jose, bioRxiv, 2024), we were unable to detect any direct interactions between PID-2 and SDG-1. Indeed, many additional granules have been recently reported (Chen et al., Nat. Commun., 2024; Huang et al., bioRxiv 2024), making it possible that SDG-1 has specific interactions with a component of one of the other granules (P, Z, M, S, E, or D) or adjacent P bodies.

      Minor

      (1) "This entry into the cytosol is distinct from and can follow the uptake of dsRNA into cells, which can rely on other receptors." Awkard sentence. Please revise.

      We have now revised this sentence to read “This entry into the cytosol is distinct from the uptake of dsRNA into cells, which can rely on other receptors”

      (2) Presumably, the dsRNA percent of the in vitro transcribed RNA is different than the 50 bp oligos that can be reliably annealed by heating and cooling. Other RNA secondary structure possibilities warrant further discussion.

      We agree that in vitro transcribed RNA could include a variety of undefined secondary structures in addition to dsRNAs of mixed length. Such structures could recruit or titrate away RNA-binding proteins in addition to the dsRNA structures engaging the canonical RNAi pathway, resulting in mixed mechanisms of silencing. Future work identifying such structures and exploring their impact on the efficacy of RNAi could be informative. We have now added these considerations to the discussion and thank the reviewer for highlighting these possibilities.

    2. eLife Assessment

      In this report, the authors present valuable findings identifying a novel worm-specific protein (sdg-1) that is induced upon loss of dsRNA import via SID-1, but is not required to mediate SID-1 RNA regulatory effects. The genetic and genomic approaches are well-executed and the revision contain generally solid support for the central findings of the work. These findings will be of interest to those working in the germline epigenetic inheritance field.

    3. Reviewer #1 (Public review):

      Summary:<br /> In the manuscript "Intergenerational transport of double-stranded RNA limits heritable epigenetic changes," Shugarts and colleagues investigate intergenerational dsRNA transport in the nematode C. elegans. By inducing oxidative damage, they block dsRNA import into cells, which affects heritable gene regulation in the adult germline (Fig. 2). They identify a novel gene, sid-1-dependent gene-1 (sdg-1), upregulated upon SID-1 inhibition (Fig. 3). Both transient and genetic depletion of SID-1 lead to the upregulation of sdg-1 and a second gene, sdg-2 (Fig. 5). Interestingly, while sdg-1 expression suggests a potential role in dsRNA transport, neither its overexpression nor loss-of-function impacts dsRNA-mediated silencing in the germline (Fig. 7).

      Strengths:<br /> • The authors employ a robust neuronal stress model to systematically explore SID-1 dependent intergenerational dsRNA transport in C. elegans.<br /> • They discover two novel SID-1-dependent genes, sdg-1 and sdg-2.<br /> • The manuscript is well-written and addresses the compelling topic of dsRNA signaling in C. elegans.

      Weaknesses:<br /> • The molecular mechanism downstream of SDG-1 remains unclear. Testing whether sdg-2 functions redundantly with sdg-1could provide further insights.<br /> • SDG-1 dependent genes in other nematodes remain unknown.

    4. Reviewer #2 (Public review):

      Summary:

      RNAs can function across cell borders and animal generations as sources of epigenetic information for development and immunity. The specific mechanistic pathways how RNA travels between cells and progeny remains an open question. Here, Shugarts, et al. use molecular genetics, imaging, and genomics methods to dissect specific RNA transport and regulatory pathways in the C. elegans model system. Larvae ingesting double-stranded RNA is noted to not cause continuous gene silencing throughout adulthood. Damage of neuronal cells expressing double-stranded target RNA is observed to repress target gene expression in the germline. Exogenous short or long double-stranded RNA required different genes for entry into progeny. It was observed that the SID-1 double-stranded RNA transporter showed different expression over animal development. Removal of the sid-1 gene caused upregulation of two genes, the newly described sid-1-dependent gene sdg-1 and sdg-2. Both genes were observed to be negatively regulated by other small RNA regulatory pathways. Strikingly, loss then gain of sid-1 through breeding still caused variability of sdg-1 expression for many, many generations. SDG-2 protein co-localizes with germ granules, intracellular sites for heritable RNA silencing machinery. Collectively, sdg-1 presents a model to study how extracellular RNAs can buffer gene expression in germ cells and other tissues.

      Strengths:

      (1) Very cleaver molecular genetic methods and genomic analyses, paired with thorough genetics, were employed to discover insights into RNA transport, sdg-1 and sdg-2 as sid-1-dependent genes, and sdg-1's molecular phenotype.

      (2) The manuscript is well cited, and figures reasonably designed.

      (3) The discovery of the sdg genes being responsive to the extracellular RNA cell import machinery provides a model to study how exogenous somatic RNA is used to regulate gene expression in progeny. The discovery of genes within retrotransposons stimulates tantalizing models how regulatory loops may actually permit the genetic survival of harmful elements.

      Weaknesses:

      (1) The manuscript is broad, making it challenging to read and consider the data presented. Of note, since the original submission, the authors have improved the clarity of the writing and presentation.

      Comments on revised version:

      This reviewer thanks the authors for their efforts in revising the manuscript. In their rebuttal, the authors acknowledged the broad scope of their manuscript. I concur. While I still think the manuscript is a challenge to read due to its expansive nature, the current draft is substantially improved when compared to the previous one. This work will contribute to our general knowledge of RNA biology, small RNA regulatory pathways, and RNA inheritance.

    1. eLife Assessment

      This is a valuable study regarding the role of gasdesmin D in experimental psoriasis. The study contains solid evidence for such a role, involving neutrophils, from murine models of skin inflammation, as well as correlative data of elevated gasdermin D expression in human psoriatic skin. The findings will be of interest to researchers trying to unravel pathways of skin inflammation.

    2. Reviewer #1 (Public review):

      Summary:

      Recommendations for the authors In this study, Liu, Jiang, Diao et.al. investigated the role of GSDMD in psoriasis-like skin inflammation in mice. The authors have used full-body GSDMD knock-out mice and Gsdm floxed mice crossed with the S100A8- Cre. In both mice, the deficiency of GSDMD ameliorated the skin phenotype induced by the imiquimod. The authors also analyzed RNA sequencing data from the psoriatic patients to show an elevated expression of GSDMD in the psoriatic skin.

      Strengths:

      It has the potential to unravel the new role of neutrophils.

      Comments on revisions:

      The authors have addressed the majority of comments and concerns and highlighted the potential limitations wherever not possible.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe elevated GSDMD expression in psoriatic skin, and knock-out of GSDMD abrogates psoriasis-like inflammation.

      Strengths:

      The study is well conducted with transgenic mouse models. Using mouse-models with GSDMD knock-out showing abrogating inflammation, as well as GSDMD fl/fl mice without neutrophils having a reduced phenotype.

      My major concern would be the involvement of other inflammasome and GSDMD bearing cell types, esp. Keratinocytes (KC), which could be an explanation why the experiments in Fig 4 still show inflammation.

      Comments on revisions:

      The authors have sufficiently addressed my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a potentially interesting study regarding the role of gasdesmin D in experimental psoriasis. The study contains useful data from murine models of skin inflammation, however the main claims (on neutrophil pyroptosis) are incompletely supported in its current form and require additional experimental support to justify the conclusions made.

      We sincerely appreciate the positive assessment regarding the significance of our study, as well as the valuable suggestions provided by the reviewers. We have included new data, further discussions and clarifications in the revised manuscript to adequately address all the concerns raised by the reviewers and better support our conclusions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Liu, Jiang, Diao et.al. investigated the role of GSDMD in psoriasis-like skin inflammation in mice. The authors have used full-body GSDMD knock-out mice and Gsdm floxed mice crossed with the S100A8- Cre. In both mice, the deficiency of GSDMD ameliorated the skin phenotype induced by the imiquimod. The authors also analyzed RNA sequencing data from the psoriatic patients to show an elevated expression of GSDMD in the psoriatic skin.

      Overall, this is a potentially interesting study, however, the manuscript in its current format is not completely a novel study.

      Strengths:

      It has the potential to unravel the new role of neutrophils.

      Weaknesses:

      The main claims are only partially supported and have scope to improve

      We thank the reviewer for the positive evaluation of the interest and potential of our work. In response to reviewers’ suggestions, we have added new content, including additional data and discussions, to further demonstrate the important role of GSDMD-mediated neutrophil pyroptosis in the pathogenesis of psoriasis, thereby enhancing the completeness of our research.

      Reviewer #2 (Public review):

      Summary:

      The authors describe elevated GSDMD expression in psoriatic skin, and knock-out of GSDMD abrogates psoriasis-like inflammation.

      Strengths:

      The study is well conducted with transgenic mouse models. Using mouse-models with GSDMD knock-out showing abrogating inflammation, as well as GSDMD fl/fl mice without neutrophils having a reduced phenotype.

      I fear that some of the conclusions cannot be drawn by the suggested experiments. My major concern would be the involvement of other inflammasome and GSDMD bearing cell types, esp. Keratinocytes (KC), which could be an explanation why the experiments in Fig 4 still show inflammation.

      Weaknesses:

      The experiments do not entirely support the conclusions towards neutrophils.

      We appreciate the reviewers’ positive evaluation regarding the application of our mouse models. We also thank the reviewers for insightful comments and suggestions that can improve the quality of our work. Addressing these issues has significantly strengthened our conclusions. Our responses to the above questions are as follows.

      Specific questions/comments:

      Fig 1b: mainly in KC and Neutrophils?

      In Figure 1b, we observed that GSDMD expression is higher in the psoriasis patient tissues compared to control samples. As the role of GSDMD in keratinocytes during the pathogenesis of psoriasis has already been explored[1], we focused our study on GSDMD in neutrophils. In response to the comments, we have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). This addition further substantiates the expression of GSDMD in neutrophils within psoriasis tissue.

      Fig 2a: PASI includes erythema, scaling, thickness and area. Guess area could be trick, esp. in an artificial induced IMQ model (WT) vs. the knock-out mice.

      In our model, to accurately assess the disease condition in mice, we standardized the drug treatment area on the dorsal side (2*3 cm). Therefore, the area was not factored into the scoring process, and we have included a detailed description of this in the revised manuscript.

      Fig 2d: interesting finding. I thought that CASP-1 is cleaving GSDMD. Why would it be downregulated?

      Regarding the downregulation of CASP in GSDMD KO mouse skin tissue, existing studies indicate that GSDMD generates a feed-forward amplification cascade via the mitochondria-STING-Caspase axis [2]. We hypothesize that the absence of GSDMD attenuates STING signaling’s activation of Caspase.

      Line 313: as mentioned before (see Fig 1b). KC also show a stron GSDMD staining positivity and are known producers of IL-1b and inflammasome activation. Guess here the relevance of KC in the whole model needs to be evaluated.

      Our research primarily focuses on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression[1]. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We have expanded upon this discussion in the revised manuscript.

      Fig 4i - guess here the conclusion would be that neutrophils are important for the pathogenesis in the IMQ model, which is true. This experiment does not support that this is done by pyroptosis.

      To address the question, we analyzed the publicly available single-cell transcriptomic data (GSE165021) and found that, compared to the control group, neutrophils infiltrating in IMQ-induced psoriasis-like tissue display a higher expression of pyroptosis-related genes (see new Figure 3e in the revised manuscript). These results strengthen our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments:

      • Figure 1: Micro abscesses would already be dead, which would likely reflect as non-specific staining. Authors should consider double staining (e.g., GSDMD+Ly6G).

      We thank the reviewer for the useful suggestion. We have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). This addition further substantiates the expression of GSDMD in neutrophils within psoriasis tissue.

      • Figures 1 b, c, and d do not have the n number for representative experiments and images.

      We apologize for our oversight. We have added the relevant information in the revised manuscript and have reviewed and corrected the entire text.

      • What is the difference between psoriasis patients in Figure 1 versus Figure 3 as the staining patterns are different? It is difficult to interpret from Figure 1 that expression is limited to neutrophils. Authors should consider double staining (e.g., GSDMD+Ly6G). How many samples were stained to draw this conclusion?

      We thank the reviewer for the suggestion. In Figure 1b, we observed that GSDMD expression is higher in the psoriasis patient tissues compared to control samples. We have added co-staining results of the neutrophil marker CD66b and GSDMD in the revised manuscript (see new Figure 3b in the revised manuscript). For each staining group, we examined samples from 3-5 patients to draw the conclusion.

      • Figure 2: GSDMD deficiency mitigates psoriasis-like inflammation in mice has been shown before (PMID#37673869). The paper showed that the GSDMD was mainly expressed in keratinocytes. What is the view of the authors on it and how does this data correlate with the data presented in this manuscript by the authors?

      Consistent with previous studies[1], we observed increased expression of pyroptosis-related proteins in psoriatic lesions. However, our research focused specifically on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We have expanded upon this discussion in the revised manuscript.

      • Figure 3d: It is unclear if the IF shows an epidermal or dermal area. As shown by authors in other figures (human psoriatic skin), do authors observe more GSDMD in the micro abscess, which is localized in the epidermis? The authors should also show the staining of GSDM/Ly6G in the whole skin sample.

      The region we presented for immunofluorescence staining corresponds to the dermis of the mice, as we did not observe typical neutrophil micro abscesses similar to those in human psoriasis in the epidermis of IMQ-induced classical psoriasis vulgaris (PV) model. Therefore, we have only shown the staining in the dermal area.

      • Figure 3e: PI staining also represents necrotic cells and TUNEL staining would not represent just apoptotic cells. It is unclear how the authors conclude an ongoing pyroptosis in neutrophils. A robust dataset is needed to provide evidence supporting neutrophil pyroptosis in the IMQ-challenged mice.

      We thank the reviewer for the valuable suggestion. GSDMD is the effector protein of pyroptosis. To further confirm that cells are undergoing pyroptosis, it is necessary to morphologically stain the GSDMD N-terminal protein. Although there is currently no GSDMD N-terminal fluorescent antibody available, we detected the cleaved N-terminus of GSDMD by WB in mouse psoriasis-like skin tissue, and its increased expression suggested increased cell pyroptosis (see new Figure 1d in the revised manuscript). Moreover, we analyzed the publicly available single-cell transcriptomic data (GSE165021) and found that, compared to the control group, neutrophils infiltrating in IMQ-induced psoriasis-like tissue display a higher expression of pyroptosis-related genes (see new Figure 3e in the revised manuscript). These results strengthen our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      • Figure 4: The authors did not clarify the reason for choosing D4 over the usual D7 for the imiquimod experiment. S100A8-Cre is also reported in monocytes and granulocytes/monocyte progenitors. And, the authors also show the expression in macrophages and neutrophils, but in the text, only neutrophils are mentioned. The authors should state the results in the text as well to avoid misrepresentation of the data.

      We thank the reviewer for the useful suggestion. We have repeated many times of experiments in our previous studies and observed that the IMQ-induced mouse psoriasis model showed the obvious signs of self-resolution after Day 4 even with continuing topical IMQ application, thus we chose 4 days over 7 days for the imiquimod experiment, which are consistent with many other studies[3, 4].

      Many studies use S100A8-Cre mice for neutrophil-specific gene knockout[5, 6]. Moreover, we used Ly6G antibody to eliminate neutrophils in GSDMD-cKO mice and control mice. It was found that the difference in lesions between the two groups was abolished after neutrophil depletion, indicating that neutrophil pyroptosis plays an important role in the pathogenesis of imiquimod-induced psoriasis-like lesions in mice. As the database analysis results showed that macrophages have slight expression of S100a8, according to the suggestion of the reviewer, we have added a more precise description in the revised manuscript.

      • Figure S2a: Ly6G antibody reduced the ly6G positive, but also negative cells compared to PBS. If this is correct, what is the explanation, and how this observation has been considered for concluding results?

      Neutrophils play an important role in regulating inflammatory responses, and their deletion can reduce the overall inflammatory level in the body, which also results in a decrease in other non-neutrophil cells. However, this change does not affect our conclusions. Our results show that after the deletion of neutrophils, there is no difference in the pathological manifestations between the cKO group and the control group. This further that GSDMD in neutrophil plays an important role in the pathogenesis of miquimod-induced psoriasis-like lesions in mice.

      • The conclusion in Figure 4i is incorrect as Ly6G administration had an effect on the wt, so it shows neutrophils play a role, but not neutrophil pyroptosis.

      - 321 "It was found that the difference in lesions between the

      - 321 two groups was abolished after neutrophil depletion (Fig4i, S2a), indicating that

      - 322 neutrophil pyroptosis plays an important role in the pathogenesis of

      - 323 imiquimod-induced psoriasis-like lesions in mice"

      Our results show that after the deletion of neutrophils, there is no difference in the pathological manifestations between the cKO group and the control group. This further indicates that the lower disease scores observed in cKO mice, in the absence of neutrophil deletion, depend on the presence of neutrophils. In the revised manuscript, we have changed the statement to “It was found that the difference in lesions between the two groups was abolished after neutrophil depletion (Fig4i, S2a), indicating that GSDMD in neutrophil plays an important role in the pathogenesis of miquimod-induced psoriasis-like lesions in mice”

      • The effect of LyG Ab: reduced PASI in the wt, but the effect on the ko remains the same. What are the other molecular changes observed? What was the level of neutrophils in the wt and the S1A008Cre GsdmDfl/fl mice under steady state and how are they change upon imiquimod challenge? A complete profiling of the immune cells is needed for all the experiments.

      As demonstrated by the results, the deletion of neutrophils did not significantly alter the pathological phenotype of cKO mice. We believe that this outcome precisely highlights the crucial role of GSDMD in regulating neutrophil inflammatory responses.

      • Figure S2b: The authors conclude that Il-1b in the imiquimod skin is mainly expressed by neutrophils, but the analysis presented in the figure does not support this conclusion. Both neutrophils and macrophages are majorly positive for I1-b, with some expression on Langerhans and fibroblasts. No n numbers are provided for the experiment

      As we discussed in the manuscript, we speculate that neutrophil pyroptosis may release cytokines, which in turn activate other cells to secrete cytokines, forming a complex inflammatory network in psoriasis. This may suggest that neutrophil pyroptosis may be involved in the pathogenesis of psoriasis by affecting the secretion of cytokines such as IL-1B and IL-6 by neutrophils, thereby affecting the function of other immune cells such as T cells and macrophages.

      We have added the n number in the revised manuscript.

      • For clarity and transparency, a list of antibodies with the associate clone and catalogue number should be provided or integrated into the method text.

      We thank the reviewer for the useful suggestion. We have added the associate clone and catalogue number of antibodies used in the method text of revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Fig 3b: psoriasis and pustular psoriasis have a different pathophysiology (autoimmune vs. autoinflammatory). Neutrophils are centrally important for GPP for the cleavage of IL-36. Guess as not further referred to pustular psoriasis in the paper, that comparison is rather deviating from the story.

      In Figure 3b, we stained for GSDMD and CD66b in both plaque psoriasis (PV) and generalized pustular psoriasis (GPP), not to compare the expression differences between the two types of psoriasis, but rather to demonstrate that significant GSDMD expression is present in neutrophils in different types of psoriasis. Unfortunately, due to the lack of a well-established animal model for GPP, we were only able to conduct studies using the established PV animal model. We acknowledge this limitation in our research. In our revised manuscript, we have added the following explanation in the discussion section: “Although we observed significantly increased GSDMD in neutrophils in pustular psoriasis, we were constrained to studying the established PV animal model due to the current absence of a mature GPP animal model. This represents a limitation of our study.”

      In summary, we appreciate the Reviewer’s comments and suggestions. We feel that the inclusion of new data addresses the concerns in a comprehensive manner and adds further support to our original conclusions. We hope you will now consider the revised manuscript worthy of publication in eLife.

      References:

      (1) Lian, N., et al., Gasdermin D-mediated keratinocyte pyroptosis as a key step in psoriasis pathogenesis. Cell Death & Disease, 2023. 14(9): p. 595.

      (2) Han, J., et al., GSDMD (gasdermin D) mediates pathological cardiac hypertrophy and generates a feed-forward amplification cascade via mitochondria-STING (stimulator of interferon genes) axis. Hypertension, 2022. 79(11): p. 2505-2518.

      (3) Lin, H., et al., Forsythoside A alleviates imiquimod-induced psoriasis-like dermatitis in mice by regulating Th17 cells and IL-17a expression. Journal of Personalized Medicine, 2022. 12(1): p. 62.

      (4) Emami, Z., et al., Evaluation of Kynu, Defb2, Camp, and Penk Expression Levels as Psoriasis Marker in the Imiquimod‐Induced Psoriasis Model. Mediators of Inflammation, 2024. 2024(1): p. 5821996.

      (5) Stackowicz, J., et al., Neutrophil-specific gain-of-function mutations in Nlrp3 promote development of cryopyrin-associated periodic syndrome. Journal of Experimental Medicine, 2021. 218(10): p. e20201466.

      (6) Abram, C.L., et al., Distinct roles for neutrophils and dendritic cells in inflammation and autoimmunity in motheaten mice. Immunity, 2013. 38(3): p. 489-501.

    1. eLife Assessment

      This important work is a versatile new addition to the chemical protein modifications and bioconjugation toolbox in synthetic biology. The technology developed cleverly uses Connectase to irreversibly fuse proteins of interest together so they can be studied in their native context, with convincing data showing the technique works for various protein partners. This work will help multiple fields to explore multi-function constructs in basic synthetic biology. This work will also be of interest to those studying fusion oncoproteins commonly expressed in various human pathologies.

    2. Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. However, due to a similar lengthy sequence between conjugation partners, the method described in this paper does not provide clear benefits over the existing SpyTag-SpyCatcher conjugation system. Additionally, specific disadvantages of the method described are not thoroughly investigated, such as difficulty in purifying and separating the desired product from the multiple proteins used. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

    3. Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. However, due to a similar lengthy sequence between conjugation partners, the method described in this paper does not provide clear benefits over the existing SpyTag-SpyCatcher conjugation system.  Additionally, specific disadvantages of the method described are not thoroughly investigated, such as difficulty in purifying and separating the desired product from the multiple proteins used. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

      Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

      I would like to sincerely thank both reviewers for their insightful and constructive feedback on the manuscript. I have addressed reviewer #1’s comments below:

      (1) The benefits over the SpyTag-SpyCatcher system. Here, the conjugation partners are fused via the 12.3 kDa SpyCatcher protein, which is considerably larger than the Connectase fusion sequence (20 aa). This is briefly mentioned in the introduction (p. 1 ln 24-25). In a related technology, the SpyTag-SpyCatcher system was split into three components, SpyLigase, SpyTag and KTag  (Fierer et al., PNAS 2014). The resulting method introduces a sequence between the fusion partners (SpyTag (13aa) + KTag (10aa)), which is similar in length to the Connectase fusion sequence. I mention this method in the discussion (p. 8, ln 296 - 297), but preferred not to comment on its efficiency. It appears to require more enzyme and longer incubation times, while yielding less fusion product (Fierer et al., Figure 2).

      (2) Purification of the fusion product. The method is actually advantageous in this respect, as described in the discussion (p. 8, ln 257-263). I plan to add a figure showing an example in the revised article.

    1. eLife Assessment

      This study investigated mitochondrial dysfunction and the impairment of the ciliary Sonic Hedgehog signaling in Lowe syndrome (LS), a timely topic given the limited research in this area. The data from patient iPSC-derived neurons and a mouse model were collected using solid methods, but the evidence supporting key claims is incomplete, and some technical aspects fall short of expectations. Despite these limitations, the study provides a useful foundation for exploring the relationship between mitochondrial defects and primary cilia in neural development.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Lo et al. seeks to explain the cellular defects underlying the brain phenotypes of Lowe syndrome (LS). There have been limited studies on this topic and hence this is a timely study.

      Strengths:

      Studies such as these can contribute to an understanding of the cellular and developmental mechanisms of brain disorders.

      Weaknesses:

      This study by Lo et al. seeks to explain the cellular defects underlying the brain phenotypes of Lowe syndrome (LS). There have been limited studies on this topic and hence this is a timely study.

      The study uses two models: (1) an LS IOB knockout mouse and (2) neurons derived from iPSC lines from LS patients. These two models are used to present three separate findings: (1) altered mitochondria function, (2) altered numbers of neurons and glia in both models, and (3) some evidence of altered Sonic Hedgehog signaling projected as a defect in cilia.

      Conceptually, there are some problems of serious concern which must be carefully considered:<br /> (1) The IOB mouse was very extensively phenotyped when it was generated by Festa et.al HMM, 2019. It does not have any obvious phenotypes of brain deficits although the studies in this paper were very detailed indeed.<br /> (2) Reduced brain size is reported as a phenotype of the IOB mouse in this study. Yet over the many clinical studies of LS published over the years, altered brain size has not been noted, either in clinical examination or in the many MRI reports of LS patients.

      While reading through these results it is striking that the link between the three reported phenotypes is at least tenuous, and in fact may not exist at all. The link between mitochondria and neurogenesis is based on a single paper that has been cited incorrectly and out of context. There is no evidence presented for a link between the Shh signaling defect reported and the mitochondrial phenotype.

      General comments

      (1) The preparation of the manuscript requires improvement. There are many errors in the presentation of data.<br /> (2) The use of references needs to be re-considered. Sometimes a reference is used when in fact the results included in that paper are the opposite of what the authors intend.<br /> (3) The authors conclude the paper by claiming that mitochondrial dysfunction and impairments of the ciliary SHH contribute to abnormal neuronal differentiation in LS, but the mechanism by which this sequence of events might happen hasn't been shown.

      Final comments:

      (1) Phenotype of increased astrocytes:<br /> The phenotype of increased astrocytes in both the IOB mouse brain or iPSC-derived cultures iN cells requires clarification as one of the markers used as an astrocyte marker, BRN2, is commonly used as a neuronal marker. As LS is a neurodevelopmental disorder, and the phenotype in question is related to differentiation, it is crucial to shed light on the developmental timeline in which this phenotype is seen in the mouse brain.

      (2) Ciliary homeostasis:<br /> Mitochondrial dysfunction in astrocytes has been shown to induce a ciliogenic program. However, almost the opposite is shown in this paper, with regards to ciliation. Morphology of the cilia was not assessed either, which is an important feature of ciliary homeostasis. The improper ciliary homeostasis here appears to be the improper Shh signalling, which has not been shown to be related to mitochondrial dysfunction. This leaves one wondering how exactly the different phenotypes shown in this paper are connected.

      (3) This paper lacks a clear mechanistic approach. While the data validates the 3 broad phenotypes mentioned, there is a lack of connection between these phenotypes or an answer to why these phenotypes appear. While the discussion attempts to shed light on this by referencing previous studies, some of the referenced studies show contradicting results. Hence, it would be beneficial to clarify these gaps with further experiments and address the larger question of the connection between the mitochondria, Shh signalling, and astrocyte formation.

      (4) Most importantly, there is no mention of how the loss of OCRL, a 5-phosphatase enzyme, results in the appearance of the mentioned phenotypes. Since there are multiple studies in the field of Lowe Syndrome that shed light on the various functions of OCRL, both catalytic and non-catalytic, it is important to address the role of OCRL in resulting in these phenotypes.

      (5) There are numerous errors in the qPCR experiments performed with regard to the genes that were assayed. The genes mentioned in the text section do not match those indicated in the graphs or legends. This takes away the confidence of the reader in this data.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates how neural cell development is affected in Lowe syndrome. Using neural cultures differentiated from human iPSCs carrying either an LS mutation or a genetically engineered mutation in OCRL, the authors show a depletion of mitochondrial DNA and a decrease in mitochondrial activities that correlate with an increased formation of astrocytes at the expense of neurons. Similar effects on mitochondria and on astrocyte development were observed in an LS mouse model. Moreover, these mutant brain cells are less likely to be ciliated and show a reduction in Sonic Hedgehog signalling.

      Strengths/Weaknesses:

      The study derives strength from the analyses of two different models of Lowe syndrome, both reaching similar conclusions. However, the observed changes in mitochondrial defects, neuronal/astrocytic development, and primary cilia are only correlated, with no attempt to investigate a causal relationship. Moreover, the mouse model is only analysed at the adult stage providing no insights into the development of the defects. Different brain regions are analysed with immunostainings and qRT-PCR making it challenging to draw clear correlations between these findings. The quality of the corresponding figures is often poor and the selection of markers is frequently inappropriate. Taken together, these limitations complicate the interpretations of the data and significantly limit the conclusions that can be drawn from the study.

    1. eLife Assessment

      This valuable study proposes a theoretical model of clathrin coat formation based on membrane elasticity that seeks to determine whether this process occurs by increasing the area of a protein-coated patch with constant curvature, or by increasing the curvature of a protein-coated patch that forms in an initially flat conformation (so called constant curvature or constant area models). Identifying energetically favorable pathways and comparing the obtained shapes with experiments provides solid support to the constant-area pathway. This work will be of interest for biologists and biophysicists interested in membrane remodelling and endocytosis. It provides an innovative approach to tackle the question of constant curvature vs. constant area coat protein formation, although some of the model's assumption are only partially supported by experimental evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a set of biophysical models to investigate whether a constant area hypothesis or a constant curvature hypothesis explains the mechanics of membrane vesiculation during clathrin-mediated endocytosis.

      Strengths:

      The models that the authors choose are fairly well-described in the field and the manuscript is well-written.

      Weaknesses:

      One thing that is unclear is what is new with this work. If the main finding is that the differences are in the early stages of endocytosis, then one wonders if that should be tested experimentally. Also, the role of clathrin assembly and adhesion are treated as mechanical equilibrium but perhaps the process should not be described as equilibria but rather a time-dependent process. Ultimately, there are so many models that address this question that without direct experimental comparison, it's hard to place value on the model prediction.<br /> While an attempt is made to do so with prior published EM images, there is excessive uncertainty in both the data itself as is usually the case but also in the methods that are used to symmetrize the data. This reviewer wonders about any goodness of fit when such uncertainty is taken into account.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors employ theoretical analysis of an elastic membrane model to explore membrane vesiculation pathways in clathrin-mediated endocytosis. A complete understanding of clathrin-mediated endocytosis requires detailed insight into the process of membrane remodeling, as the underlying mechanisms of membrane shape transformation remain controversial, particularly regarding membrane curvature generation. The authors compare constant area and constant membrane curvature as key scenarios by which clathrins induce membrane wrapping around the cargo to accomplish endocytosis. First, they characterize the geometrical aspects of the two scenarios and highlight their differences by imposing coating area and membrane spontaneous curvature. They then examine the energetics of the process to understand the driving mechanisms behind membrane shape transformations in each model. In the latter part, they introduce two energy terms: clathrin assembly or binding energy, and curvature generation energy, with two distinct approaches for the latter. Finally, they identify the energetically favorable pathway in the combined scenario and compare their results with experiments, showing that the constant-area pathway better fits the experimental data.

      Strengths:

      The manuscript is well-written, well-organized, and presents the details of the theoretical analysis with sufficient clarity.<br /> The calculations are valid, and the elastic membrane model is an appropriate choice for addressing the differences between the constant curvature and constant area models.<br /> The authors' approach of distinguishing two distinct free energy terms-clathrin assembly and curvature generation-and then combining them to identify the favorable pathway is both innovative and effective in addressing the problem.<br /> Notably, their identification of the energetically favorable pathways, and how these pathways either lead to full endocytosis or fail to proceed due to insufficient energetic drives, is particularly insightful.

      Weaknesses:

      Membrane remodeling in cellular processes is typically studied in either a constant area or constant tension ensemble. While total membrane area is preserved in the constant area ensemble, membrane area varies in the constant tension ensemble. In this manuscript, the authors use the constant tension ensemble with a fixed membrane tension, σe. However, they also use a constant area scenario, where 'area' refers to the surface area of the clathrin-coated membrane segment. This distinction between the constant membrane area ensemble and the constant area of the coated membrane segment may cause confusion.

      As mentioned earlier, the theoretical analysis is performed in the constant membrane tension ensemble at a fixed membrane tension. The total free energy E_tot of the system consists of membrane bending energy E_b and tensile energy E_t, which depends on membrane tension, σe. Although the authors mention the importance of both E_b and E_t, they do not present their individual contributions to the total energy changes. Comparing these contributions would enable readers to cross-check the results with existing literature, which primarily focuses on the role of membrane bending rigidity and membrane tension.

      The authors introduce two different models, (1,1) and (1,2), for generating membrane curvature. Model 1 assumes a constant curvature growth, corresponding to linear curvature growth, while Model 2 relates curvature growth to its current value, resembling exponential curvature growth. Although both models make physical sense in general, I am concerned that Model 2 may lead to artificial membrane bending at high curvatures. Normally, for intermediate bending, ψ > 90, the bending process is energetically downhill and thus proceeds rapidly. the bending process is energetically downhill and thus proceeds rapidly. However, Model 2's assumption would accelerate curvature growth even further. This is reflected in the endocytic pathways represented by the green curves in the two rightmost panels of Fig. 4a, where the energy steeply increases at large ψ. I believe a more realistic version of Model 2 would require a saturation mechanism to limit curvature growth at high curvatures.

    1. eLife Assessment

      This study provides valuable quantitative data and analysis that reveals variations in 'Dorsal' nuclear dynamics along the dorso-ventral axis in the early Drosophila embryo. The evidence that supports that these variations are due to Dorsal/Cactus interactions in dorsal nuclei is convincing, albeit incomplete to understand the biological implications of these findings for developmental patterning.

    2. Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Toll-dependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters. For example, I think that the implications of the rejected hypothesis (i.e., that Toll-dependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

    1. eLife Assessment

      With compelling electrophysiological and behavioural evidence, this work establishes that the activity of insulin-producing cells (IPCs) depends on the nutritional state in Drosophila and that, like in mammals, there is also an incretin-like effect with IPCs responding to glucose feeding but not to glucose perfusion. Moreover, the authors demonstrate that DH44 neurons respond to glucose perfusion and, together with IPCs, modulate locomotor activity. This important study on the neuronal regulation of metabolic homeostasis will be of interest to both neuroscience and to medical research in diabetes.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors further provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies similar to mammals. The authors link decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed to increase food search. Furthermore, the authors provide evidence that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity.

      This paper is of outstanding interest to scientists aiming to understand metabolic control of circuit dynamics, in particular for internal state-linked behaviors competing with the feeding state.

      Strengths:

      (1) By using whole cell patch clamp recording, the authors convincingly showed the activity pattern and regulation of IPCs and neighboring DH44 neurons under different feeding states and in various refeeding paradigms.<br /> (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of nutritive sugars contrary to the IPCs.<br /> (3) The paper also provides useful data on the regulation of IPC activity by Dh44 neurons, which is useful to understand their regulation in vivo.

      No major weaknesses remain in the revised version of this work.

    3. Reviewer #2 (Public review):

      Summary

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of Drosophila melanogaster. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made.

      Strengths

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools.<br /> It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to.

    4. Reviewer #3 (Public review):

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo in limited.

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and various optogenetic as well as feeding manipulations.

      The data provide compelling evidence that IPC activity is increased with a slow time course after feeding a high glucose diet. By contrast, IPC activity is not directly affected by rising blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding.

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, strong evidence shows that IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect locomotion. Together, these data reveal a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to the nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network.

      Strengths:

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are excellent. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose sensitive modulatory neurons (Dh44) is compelling, too. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, nutritional state, and behavior. Demonstrating the incretin effect in Drosophila provides novel experimental routes to further study it. During the revision process, compelling evidence has been added to underscore the incretin effect, the finding that IPCs themselves do not sense sugars, and that feeding a high sugar diet does not cause unspecific stress responses.

      I found no more weaknesses: The authors have carefully addressed all of my previous critiques by adding compelling new data and carefully revising the text. This paper provides a prime example of how responsible authors can utilize this constructive (but relatively new) reviewing procedure to make a very good manuscript even better.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies, similar to mammals. The authors link the decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed at increasing food search. 

      Furthermore, there is supporting evidence in the paper that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity. However, although the electrophysiological data underlying the dynamics of IPCs in vivo is compelling, the link between IPCs and other potential elements of the circuitry (e.g. octopaminergic neurons) regulating locomotive behaviors is not clear and would benefit from more rigorous approaches. 

      This paper is of interest to cell biologists and electrophysiologists, and in particular to scientists aiming to understand circuit dynamics pertaining to internal state-linked behaviors competing with the feeding state, shown here to be primarily controlled by the IPCs. 

      Strengths: 

      (1) By using whole-cell patch clamp recording, the authors convincingly showed the activity pattern of IPCs and neighboring DH44 neurons under different feeding states. 

      (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of glucose contrary to the IPCs. 

      (3) The paper provides useful data on the firing pattern of 2 key cell populations regulating foodrelated brain function and behavior, IPCs and Dh44 neurons, results which are useful to understand their in vivo function. 

      Weaknesses: 

      (1) The term nutritional state generally refers to the nutrients which are beneficial to the animal. In Figure 1, the authors showed that IPCs respond to glucose but not proteins. To validate the term nutritional state the authors could test the effect of a non-nutritive sugar (e.g. D-arabinose or L-Glucose) on the post-ingestive physiological responses of the IPCs.

      We thank the referee for this insightful comment. Following their suggestion, we included two new experimental data sets, which we added to Figure 1: We show that IPCs do not respond to the non-nutritive sugar D-arabinose (Figure 1H). In order to further expand this data set and our conclusions, we additionally show that IPCs do respond to fructose – a second nutritive sugar in addition to glucose (Figure 1H). Together, these data sets permit the conclusion that IPCs are sensitive to the ingestion of nutritive sugars, and do not respond to ingestion of nonnutritive sugars or high protein diets. Thus, we validate the term nutritional state.

      (2) It is difficult to grasp the main message from the figures in the result section as some figures have several results subsections referring to different points the authors want to make. The key results of a figure will be easier to understand if they are summarized in one section of the results. Alternatively, a figure can be split into 2 figures if there are several key messages in those figures, e.g. Figures 2 and 3.  

      We appreciate this suggestion and have made several changes to our manuscript to add more clarity. Among other things, we have changed the order of data presentation in Figure 2, as suggested by the referee below, where we now start with the IPC activation data rather than the OAN activation. We also swapped the order of data presentation and split Figure S1 into Figures S1 & S2. Moreover, we re-arranged the panel order in supplementary figure S4. This significantly improved the flow of the results section. Since the figures the referee refers to contain comparative data, for example between diets (Figure 1) or neuron types (Figure 2), we prefer to keep these data sets together. However, we have carefully revised the results section to more clearly relate our statements to individual figure panels.

      (3) The prime investigation of the paper is about the physiological response and locomotive behavioral readout linked to IPCs. The authors do not show a link between OANs and IPCs in terms of functional or behavioral readouts. In Figure 2 the authors first start with stating a link between OAN neurons and locomotion changes resulting from internal feeding states. The flow of the paper would be better if the authors focused on the effect of optogenetic activation of IPCs under different feeding states and their impact on fly locomotion. If the experiments done on optogenetic activation of OANs were to validate the experimental approach the data on OAN neurons is better suited for the supplement without the need of a subsection in the result section on the OANs.  

      We agree with the reviewer’s suggestion and switched the order of the figure panels and text to aid the flow of the manuscript. We now show and discuss the IPC activation data first (Figure 2C-H) and OAN activation afterwards (Figure 2I-K). We did keep the OAN data in the main document, though, since that facilitates comparisons between the small effects of IPC activation and the large, well-established effects of OAN activation.

      (4) Figure 2F shows that optogenetic activation of IPCs in fed flies does not influence their locomotor output. In the text, the conclusion linked to Figure 2F-H states that IPC activation reduces starvation-induced hyperactivity which is a statement more suited to Figure 2I-K. 

      We edited the text accordingly.

      (5) The authors show activation of Dh44 neurons leads to hyperpolarisation of the IPCs. What is the functional link between non-PI Dh44 neurons and the IPCs? Do IPCs express DH44R or is DH44 required for this effect on IPCs? Investigating a potential synaptic or peptidergic link between DH44 neurons and IPCs and its effect on behavior would benefit the paper, as it is so far not well connected. 

      Although we have not performed any experiments dedicated to investigating the functional link between DH44Ns outside the PI and the IPCs in this study, there are two lines of evidence supporting that this connection is relatively direct. First, IPCs do express DH44R1 & R2, as we show in a parallel study in eLife (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). Second, we performed functional connectivity experiments using a Leucokinin (LK) driver line in that paper. This driver line labels two pairs of non-PI DH44Ns in the VNC, which are DH44 and LK positive (Zandawala et al 2018). Activating that line leads to inhibition of IPCs, similar to the effect we observed here for DH44N activation. These two lines of evidence suggest that there could be a direct peptidergic connection between DH44+ neurons and IPCs. We have added a paragraph mentioning these experiments to our discussion:

      ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44. A strong candidate for the inhibition are LK and DH44-positive neurons, which are labelled by the broad line(76). In a parallel study, we showed that LK-expressing neurons strongly inhibit IPCs(30), similar to the broad DH44 line used here. Furthermore, evidence from single-nucleus transcriptomic analysis shows that IPCs express DH44-R1 and DH44-R2 receptors(30). Therefore, it is possible that DH44Ns communicate with IPCs through a direct peptidergic connection. Notably, the inhibitory effect of non-PI DH44Ns on IPCs was very strong and fast, suggesting that a connection via classical synapses is more likely. Regardless, our results show that the glucose sensing DH44<sup>PI</sup>Ns and IPCs act independently of each other.’

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of *Drosophila melanogaster*. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made. 

      Strengths: 

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools. 

      It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to. 

      Weaknesses: 

      I find the inability of SD to rescue the IPC starvation effect in Figure 1G&H surprising, given that the fully fed flies were raised and kept on that exact diet. Did the authors try to refeed flies with SD for longer than 24 hours? I understand that at some point the age effect would also kick in and counteract potential IPC activity rescue. I think the manuscript would benefit if the authors could indicate the exact age of the SD refed flies and expand a bit on the discussion of that point.  

      We have expanded the first paragraph of our discussion to tackle these questions, in particular the potential effect of aging, as suggested by the referee. We now also indicate the exact age of the flies. Moreover, we have conducted additional experiments in which we added either glucose or arabinose to our standard diet (Figure 1H). As we would have expected based on our hypothesis that the glucose concentration in our standard diet was too low to cause an increase in IPC activity after starvation, we find that feeding standard diet plus glucose increases IPC activity to the same level as glucose only, and that adding arabinose to the standard diet does not lead to increased IPC activity after starvation (Figure 1H).

      The incretin-like effect is exciting and it will be interesting in the future to find out what might be the signal mediating this effect. It is interesting that IPCs in explants seem to be responsive to glucose. I think it would help if the authors could briefly discuss possible sources for the different findings between these in fact very different preparations. Could the the absence of the inhibitory DH44 feedback in the *ex-vivo* recordings for example play a role? 

      We thank the referee for this interesting point and expanded our discussion accordingly. We included that, in particular in brain explants without a VNC, the inhibitory connection we describe might be absent, as the referee suggested: ‘Previous ex vivo studies suggested that IPCs, like pancreatic beta cells, sense glucose cell-autonomously(23,24). Consistent with this, we observed an increase in IPC activity after the ingestion of glucose (Figure 2B). However, IPC activity did not increase during the perfusion of glucose directly over the brain. Importantly, the fly preparations were kept alive for several hours allowing the glucose-rich saline to enter circulation and reach all body parts. Several factors may explain the difference between ex vivo and in vivo preparations. First, in ex vivo studies, certain regulatory feedback mechanisms present in vivo could be absent. For example, the strong inhibitory input IPCs receive from DH44Ns we found would likely be absent in brain explants without a VNC. A lack of inhibitory feedback might allow for more direct glucose sensing by IPCs ex vivo, whereas in vivo, the IPC response could be suppressed by more complex systemic feedback. Second, we attempted to use the intracellular saline formulation employed in a previous ex vivo study44. However, we observed that IPCs depolarized quickly using this saline, leading to unstable recordings that did not meet our quality standards for in vivo experiments. Another possible explanation for the lack of an effect of glucose might have been that the dominant circulating sugar in flies is trehalose(70,71) which is derived from glucose. When we extended our experiments, we found that trehalose perfusion did not affect IPC activity either, strengthening the idea that IPCs do not directly sense changes in hemolymph sugar levels. Therefore, our findings suggest that, similar to mammals, IPC activity and hence, insulin release, is not simply modulated by hemolymph sugar concentration in Drosophila.’ 

      The incretin-like effect the authors observed seems to start only after 5h which seems longer than in mammals where, as far as I know, insulin peaks around 1h. Do the authors have ideas on how this timescale relates to ingestion and glucose dynamics in flies? 

      We have now included the following section in the discussion to explicitly address the question of different activity dynamics in flies and mammals, but also the limitations of our electrophysiological approach in this regard: ‘We observed that IPC activity increased over a timescale of hours, which is longer compared to the fast insulin response in mammals, where insulin typically peaks within an hour of feeding(97). In flies, insulin levels rise within minutes of refeeding, followed by a drop after 30 min(20). Our experimental techniques limit our ability to capture these fast initial dynamics, since the preparation for intracellular recordings requires tens of minutes, so that we typically recorded IPC activity at least 20 min after the last food ingestion. Notably, studies in fasted mammals have shown that insulin peaks within minutes of refeeding, followed by a rapid decline, with levels stabilizing as feeding continues(98,99). We speculate a similar dynamic could be present in flies, but with our approach, we capture the steady-state reached tens of minutes after food ingestion rather than a potential initial peak.’ 

      The authors mention "a decrease in the FV of IPC-activated starved flies even before the first optogenetic stimulation (Figure 2I),". Could this be addressed by running an experiment in darkness, only using the IR illumination of their behavioral assay? 

      We thank the referee for pointing out this unexpected result. We discuss this in more detail in the new version of our manuscript and expand on the reasons for not performing these optogenetic activation experiments in the dark: First, the red LED required to activate CsChrimson triggers strong startle responses in dark-adapted flies, which mask other behavioral effects, in particular subtle ones such as those observed for IPCs. The startle response is much reduced when performing experiments under low background light conditions. Second, flies, at least in our hands, do not exhibit robust foraging behavior or starvation-induced hyperactivity in the dark, which is critical for our behavioral experiments. However, we also explain in our discussion that we believe the effect of background illumination is relatively small, since flies expressing CsChrimson in OANs or DH44Ns show comparable activity levels to controls. Hence, a part of this effect is likely attributable to leak currents induced by CsChrimson expression. We would like to point out though that we are careful in our description of the IPC effect on behavior, and focus on the fact that it is considerably smaller than the effects of other modulatory neurons (DH44Ns and OANs).

      The authors show an inhibitory effect of DH44 neuron activation on IPC activity. They further demonstrate that DH44PI neurons are not the ones driving this and thus conclude that "...IPCs are inhibited by DH44Ns outside the PI.". As the authors mentioned the broad expression of the DH44-Gal4 line, can they be sure that the cells labeled outside the PI are actually DH44+? If so they should state this more clearly, if not they should adapt the discussion accordingly.   

      We have substantially added to our discussion of this point, according to the referee’s great suggestion. In short, the broad line includes neurons that are DH44 positive and neurons that are not: ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44.’

      Reviewer #3 (Public Review): 

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo is limited. 

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and optogenetic manipulation. 

      The data indicate that IPC activity is increased with a slow time course after feeding a high-glucose diet. By contrast, IPC activity is not directly affected by increasing blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding. 

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect modulation. Together, these data indicate a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network. 

      Strengths: 

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are compelling. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose-sensitive modulatory neurons (Dh44) is strong. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, and behavior. 

      Weaknesses: 

      Neither the mechanisms underlying the incretin effect, nor the network to orchestrate physiological, metabolic, and behavioral responses to nutritional state have been fully uncovered. Without additional controls, some of the conclusions would require significant downtoning. Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose. The claim that IPC activity is controlled by the nutritional state would require that starvation-induced IPC silencing in young animals can be recovered by feeding a normal diet. At current firing in starvation, silenced IPCs can only be induced by feeding a high-glucose diet that lacks other important ingredients and reduces vitality. Therefore, feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges. The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity. The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect. The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated. 

      We thank the referee for the thoughtful and constructive criticism of our experiments and conclusions. Below, we lay out how we tackled the individual points raised by the referee.

      (1) ‘Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose.’  

      To address this point, we conducted experiments in which we perfused trehalose (Figure 3B), the main circulating hemolymph sugar in Drosophila and other insects. Our results clearly show that trehalose does not affect IPC activity upon perfusion, confirming our statements that IPCs do not sense key blood sugars directly.

      (2) ‘Feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges’. 

      We agree with the referee that this point was not completely fleshed out in our first submission. We have now performed additional experiments in which we added glucose (and fructose) to our standard diet (Figure 1H). Flies feeding on this diet received all necessary nutrients but still experienced high concentrations of sugars. The effects of high glucose in a standard diet background were indistinguishable from those of high glucose in agarose, confirming that the IPCs respond to sugar rather than stress. Another important observation in this context is that IPCs in flies kept on a high protein diet exhibited much lower spike rates than flies exhibiting the high glucose diet, even though they had a much shorter lifespan and therefore, presumably, experienced much higher stress levels (Figure 1H, Figure S1). These observations underline that stress is certainly not the primary factor here.

      (3) ‘The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity.’

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvation-induced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      (4) ‘The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect.’

      We followed the referee’s excellent suggestion and determined the time course of the starvation effect in three timesteps, similar to the experiments we did for refeeding (Figure 1G). In addition, we now also quantify the number of active IPCs (i.e., IPCs that fired at least one action potential during our five-minute analysis window), which further illustrates the dynamics of the starvation and refeeding effects. We find that the starvation effect is graded, and that IPC activity decreases with increasing starvation duration.

      (5) ‘The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated.’

      To address the referee’s comment, we have added 14 new IPC recordings from flies in the 6–26-day range, such that we now have recordings from 9-14 IPCs for each age range (Figure S2B). They confirmed our previous analysis and strengthened the finding that IPC activity dramatically decreases after 8 days (on our standard diet). The total number of IPCs in this supplementary dataset was thus increased from 34 to 48.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) Do IPCs respond to glucose specifically after ingestion or generally to any other nutritive sugars? To tackle this question the IPC responses in starved flies can be recorded after refeeding flies with other nutritive sugars (fructose, sucrose). 

      To address this important question, we have performed additional experiments in which we refed starved flies with fructose, as a nutritive sugar, and arabinose, as a non-nutritive sugar. As expected, IPCs responded to fructose but not arabinose and hence nutritive sugars in general. We describe and discuss these key results in the new version of our manuscript.

      (2) In Figure 2, the x and y axes are not annotated on all subfigures, which might help improve clarity. 

      We have annotated the subfigures as requested.

      (3) In the discussion on page 9 ("...we observed an increase in IPC activity after the ingestion of glucose (Figure 2B)."), the authors refer to Figure 2B instead of 3C.

      We have fixed this oversight.

      Reviewer #2 (Recommendations For The Authors): 

      Introduction 

      I think it could be helpful for the reader if you would briefly state the number of IPCs and whether you are targeting all of them with Dilp2-Gal4. 

      We included the numbers according to the suggestion. 14 IPCs are labeled in the driver line, and this is the number of IPCs commonly assumed to be present in the PI.

      Figures 

      In some Figures (for example 1D & E) the authors state the number of IPCs recorded (N) but not the number of animals used (n). This should be stated as the data from within an animal are dependent and might give insights about IPC heterogeneity. 

      We have compiled tables for the supplementary material (Tables S5 & S6) in which we state the number of IPCs and DH44<sup>PI</sup>Ns recorded and the number of different flies for each figure panel. We have recorded an average of 1.4 IPCs per fly (217 IPCs from 160 flies). We therefore expect the bias introduced by individual flies to be rather small. However, in our parallel study, we specifically investigate the heterogeneity of IPCs by maximizing the number of IPCs recorded per fly (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). In the case of DH44PINs, we recorded 24 neurons in 21 flies – 1.1 neurons per fly.

      - Figure 3D: There is some white visible among the cell bodies in the overlay. I assume this comes from projecting across layers rather than indicating DH44 - IPC overlap? It would help to explicitly state that. 

      We have added a statement to the results section, in which we explain that most of the white is due to overlap in the z-projection rather than overlap in the driver lines. However, there are few cases (typically one to two cells per brain), in which neurons labeled by the DH44 line also stain positive for Dilp2, indicating they express both neuropeptides. We have added this information to the manuscript:  

      Results: ‘DH44<sup>PI</sup>Ns are anatomically similar to IPCs, and their cell bodies are located directly adjacent to those of IPCs in the PI, making them an ideal positive control for our experiments (Figure 3D). A small subset of DH44<sup>PI</sup>Ns also expresses Dilp2(75), and our immunostainings confirmed colocalization of Dilp2 and DH44 in a single neuron (Figure 3D, white arrow).’

      In figure caption: ‘UAS-myr-GFP was expressed under a DH44-GAL4 driver to label DH44 neurons. GFP was enhanced with anti-GFP (green), brain neuropils were stained with anti-nc82 (cyan), and IPCs were labelled using a Dilp2 antibody (magenta). White arrow indicates Dilp2 and DH44-GAL4 positive neuron. The other white regions in the image result from an overlap in z-projections between the two channels, rather than from antibody colocalization.’

      - Figure 4I: One might get the impression that the fast onset peak of activity precedes the stimulation onset, using a thinner line width might help avoid that. 

      This effect is due to a combination of using relatively heavy lines for clear visibility of the data and a gentle smoothing step (a 2s median filter, which corresponds to less than 1% of the 300s stimulation window) in our analysis of the behavioral data. However, inspection of the raw data clearly shows increases in velocity after the onset of the optogenetic activation. We clarified this in the figure caption: ‘Average FV across all DH44N activation trials based on two independent replications of the experiment in I. Note that the peak in average FV lies within the first frame of the stimulation window.’

      - S3 panel letters do not match references in the text.

      We fixed this oversight.

      Formatting 

      - Page 10: The paragraphs on the bottom of the page got switched around.

      This has been fixed.

      - Page 14: The first paragraph after the header "Free-walking assay" seems to be coming from elsewhere. 

      We apologize for this slightly embarrassing mistake. We used our related bioRxiv preprint (Held et al.) as a template for formatting this paper, and accidentally left this part of the methods section in the manuscript. We have fixed this error in our resubmission.

      Reviewer #3 (Recommendations For The Authors): 

      Major suggestions: 

      (1) The data show convincingly that IPC activity is decreased by starvation during the first week of adult life (Figures 1C and D). However, the conclusion that IPC activity is controlled by the nutritional state requires additional care. First, refeeding starved adult animals with a normal diet does not bring back normal IPC firing rates (Figure 1H). Therefore, IPC activity does not strictly follow changes in nutritional state, but IPCs are silenced by starvation. Second, from the second week of adult life on, IPCs are silent anyway, and thus unlikely responsive to changes in the nutritional state anymore (which might be different on a different standard diet?) The only effect of feeding on IPC activity is observed upon feeding starved, young animals with high glucose for 12-24 hrs (Figure 1G). However, it is not clear whether increased IPC firing is caused by the effects of high glucose on the nutritional state in a normal range, or because of diet-induced stress (the diet also severely shortens lifespan, Figure 1S). Does high glucose also increase IPC firing rate in young, fed animals? These would have strongly increased glucose concentrations but not suffer the stress of not getting any other nutrients. Such experiments would be required to make the statement that glucose feeding increases IPC firing rate. 

      We have performed several experiments to address this criticism. First, we performed a time course analysis of the starvation effect. We show that the IPC activity reduction is graded, and that IPC activity declines already after two hours of starvation, a timepoint at which stress levels should still be relatively small (Figure 1G). Second, we refed flies with high glucose concentrations added to the standard diet (Figure 1H). This minimized any potential stress responses due to a lack in nutrients. Third, we now show that IPCs specifically respond to nutritive (glucose and fructose), but not to non-nutritive sugars (arabinose, Figure 1H). We believe that these data sets, in addition to the graded refeeding effect, make a strong case for the nutritional state dependent modulation of IPCs. 

      (2) The testing of locomotor activity is well done, nicely recapitulates starvation-induced increases in locomotion, and adds interesting novel findings on refeeding with high glucose versus high protein diet. However, the statement that locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity does not reflect the data presented. Refeeding starved flies with a standard diet had no effect on IPC activity (Figure 1H) but a strong effect on locomotor activity of starved flies (a strong reduction, even stronger than high glucose diet, Figure 2B). 

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvationinduced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      Related to points 1 and 2, a key statement that the results establish that IPC activity is controlled by the nutritional state requires care. What the data convincingly show is that IPC activity is near zero upon starvation. 

      As described above, we have added several extensive data sets (fructose feeding, arabinose feeding, trehalose perfusion, starvation time course) to show that we indeed observe a nutritional state dependent modulation of IPCs and describe these new results in the results and discussion.

      (3) The time course of nutritional state-dependent changes of IPC activity is claimed to be slow, several hours to days. Unless I have missed a figure, the underlying data are not presented (only for high glucose diet). It would be great if this could also be shown for a standard diet with higher glucose concentrations than the one used so that it rescues starvation-induced IPC silencing without shortening lifespan (if this is feasible?). The data showing starvation-induced IPC silencing are convincing, but, unless I have missed it, the time course has not been determined. It would be very nice to actually show this. Have different starvation times been tested in relation to IPC firing rate, and if yes, with what time resolution? Does IPC activity change already after 0.5 or 1 or a few hours of starvation? If starvation can silence IPCs faster than assumed, the nearzero IPC activity in animals older than a week could very well be caused by longer time intervals between meals. 

      We have performed experiments to address both important points raised by the referee here. 1) We have added high glucose concentrations to our standard diet, and show that it has the same effect – a significant increase in IPC activity – as the high glucose diet (Figure 1H). 2) We have analyzed the time course of IPC activity reduction in response to starvation (Figure 1G). Indeed, we find that a few hours of starvation start reducing IPC activity. We discuss the possibility that reduced IPC activity in older flies could be due to reduced food intake: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days86. Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      (4) The data on the proposed incretin effect are of high importance in potentially highlighting a highly conserved link between glucose ingestion and insulin release. An important control would be to test different sugars, such as trehalose, an important blood sugar of flies. If glucose is converted into trehalose and this is what IPCs sense, then perfusion of glucose has no effect. The fact fantastic experiments show that the DH44 neurons are sensitive to glucose perfusion does rule out that IPCs sense a different sugar. This would be very different from the incretin effect that requires additional hormones. In addition, as mentioned above, controls are required to show that high glucose affects IPCs as a nutrient and not as a stressor (see point 1), for example refeeding with a standard diet that contains a higher glucose concentration but does not reduce lifespan. Another great control to solidify the exciting claim on the incretin effect would be to knock out candidate Drosophila incretin hormones and test whether a high glucose diet stops increasing the IPC firing rate (although simpler controls might also do the job). 

      We have performed the two key experiments suggested by the referee. 1) We perfused trehalose as the primary blood sugar of flies and showed that IPCs do not respond to trehalose perfusion (Figure 3B & C). This further strengthens the finding that IPC activity in flies shows an incretin-like effect. 2) We have added high concentrations of glucose to our standard diet to provide flies with a full diet that contains high glucose concentrations. IPC activity in these flies was indistinguishable from the activity in flies which consumed pure glucose diets. In contrast, IPC activity in flies kept on a high protein diet, which dramatically reduced lifespan, was very low. These results clearly show that higher IPC activity is not due to increased stress levels, but a function of nutritive sugar ingestion. We further validated this hypothesis by refeeding flies with fructose as a nutritive sugar, which increased IPC activity, and arabinose as a non-nutritive sugar, which did not affect IPC activity (Figure 1H).

      Another point that might be relevant to this discussion is that IPC activity is almost entirely shut down during flight in Drosophila (which we showed in Liessem et al. 2023, Current Biology 33 (3), 449-463. e5). Several ‘stress hormones’ are released during flight, including octopamine. The fact that IPC activity is low in flying flies, starved flies, and flies kept on a pure protein diet (which all experience high stress levels), to us, very clearly suggests that stress is not the predominant factor here. We would also like to point out that, while the lifespan was reduced in flies kept on pure glucose diets, survival rates were at 100% until day 14, and we carried out our experiments on day 2 after starvation. Hence, these flies might not (yet) experience particularly high stress levels.

      (5) The discussion relates the absence of IPC firing in animals older than 1 week to aging. However, given that the flies fed on a normal diet show the typical lifespan for Drosophila, a 10-dayold fly is still in its youth. Maybe flies at 10 days eat simply less and thus IPC spiking goes down as in starved flies, especially because the standard diet used contains low glucose. Do IPCs also become silent after a week if the animals are fed with a standard diet that contains a higher glucose concentration? Without additional controls, this part of the discussion is pretty speculative and should be revised. 

      We agree with the reviewer, that it is not clear whether reduced IPC activity is a direct result of physiological changes that occur with aging, or an indirect effect of reduced food intake, which occur during aging. In both cases, in our view, it would be an age-related effect. Since this is a minor point of our manuscript, we decided not to perform additional experiments, other than significantly increasing the sample size for the aging data set already presented to shore up our findings (Figure S2B). We have, however, revisited the discussion of this point according to the referee’s suggestion: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days(85). Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      Other suggestions: 

      (6) For the mixed effects of octopamine and tyramine on larval locomotion that are referred to, it might be interesting to also look at Schützler et al 2019, PNAS because it shows that starvation activates TBH so that the octopamine to tyramine ratio is increased. 

      We refer to Schützler et al. in the following paragraph of our discussion: ‘This intermittent locomotor arrest has been previously described in adult flies and is thought to be mediated by ventral unpaired median OANs, which have been suggested to suppress long-distance foraging behavior(69). Since these are not the only neurons we activate in the TDC2 line, we speculate that the stopping phenotype could also result from concerted effects of octopamine and tyramine modulating muscle contractions(65-67) and motor neuron excitability(68), as previously described in Drosophila larvae, or from OANs interfering with pattern generating networks in the ventral nerve cord (VNC) during longer activation(69).’  

      (7) The reference list requires care. For example, reference 43 is identical to 67, reference 66 gives no information on incretin-like hormones in Drosophila as stated in the text 

      We carefully double-checked our reference list and corrected the mistakes mentioned.

    1. eLife Assessment

      The manuscript presents valuable findings of bone remodeling under chronic unpredictable mild stress (CUMS). This is an interesting work on mental stress on bone health and osteoporosis, and the authors offer solid evidence of decreased bone mass mediated by miR-335-3p/Fos signaling in osteoclasts that are involved in the induction of bone loss caused by CUMS. This revised version provided new data that improved the quality of the manuscript and addressed the reviewers' concerns.

    2. Reviewer #1 (Public review):

      I have reviewed the manuscript "Psychological stress disturbs bone metabolism via miR-335-3p/Fos signaling in osteoclast" with interest. The described findings are relevant and useful for daily practice in periodontology. The paper is concise, professionally written, and easy to read. In this study, Jiayao et al. revealed the role of miR-335-3p in psychological stress-induced osteoporosis. CUMS mice were constructed to observe the femur phenotype, osteoclasts were identified as the main research object, and miRNA-seq was used to find the key miRNAs linking the brain and peripheral tissues. This study showed that miR-335-3p expression was simultaneously reduced in murine NAC, serum, and bone under psychological stress. The miR-335-3p/Fos/NFATC1 signaling pathway was validated in osteoclasts to reveal the potential mechanism of enhanced osteoclast activity under psychological stress. This study, from a new perspective of miRNAs, indicates a possible cause of disturbed bone metabolism due to psychological stress and may suggest a new approach to treating osteoporosis.

    3. Reviewer #2 (Public review):

      Zhang et al. established chronic unpredictable mild stress (CUMS) mouse model, which displayed osteoporosis phenotype, suggesting a potential correlation between psychological stress and bone metabolism. They found that miRNA candidate miR-335-3p is downregulated in the long bone of CUMS mice through microRNA sequencing experiments and qRT-PCR. They further demonstrated that miR-335-3p attenuates osteoclast activity via inhibiting Fos signaling, which can induce NFATC1 expression and regulate osteoclast activity.

      My concerns have been addressed. And the quality of the manuscript is improved significantly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have reviewed, with interest, the manuscript "Psychological stress disturbs bone metabolism via miR-335-3p/Fos signaling in osteoclast". The described findings are relevant and useful for daily practice in periodontology. The paper is concise, professionally written, and easy to read. In this study, Jiayao et al. revealed the role of miR-335-3p in psychological stress-induced osteoporosis. CUMS mice were constructed to observe the femur phenotype, osteoclasts were identified as the primary research object, and miRNA-seq was used to find the key miRNAs linking the brain and peripheral tissues. This study showed that the expression of miR-335-3p was simultaneously reduced in mice's NAC, serum, and bone under psychological stress. The miR-335-3p/Fos/NFATC1 signaling pathway was validated in osteoclasts to reveal the potential mechanism of enhanced osteoclast activity under psychological stress. From a new perspective of miRNAs, this study indicates a possible cause of disturbed bone metabolism due to psychological stress and may suggest a new approach to treating osteoporosis.

      We thank this reviewer for the instructive suggestions and encouragement.

      Reviewer #2 (Public Review):

      Zhang et al. established chronic unpredictable mild stress (CUMS) mouse model, which displayed osteoporosis phenotype, suggesting a potential correlation between psychological stress and bone metabolism. They found that miRNA candidate miR-335-3p is downregulated in the long bone of CUMS mice through microRNA sequencing and qRT-PCR experiments. They further demonstrated that miR-335-3p attenuates osteoclast activity via inhibiting Fos signaling, which can induce NFATC1 expression and regulate osteoclast activity.

      Strengths:

      The authors established CUMS mouse model and confirmed the osteoporosis phenotype through careful characterization of bone and analysis of osteoclast activity. They performed microRNA sequencing to identify the miRNA candidate regulating the bone loss in the CUMS mouse model. They also validated the expression of miR-335-3p and interfered with the function of miR-335-3p through an in vitro assay. Overall, the findings from this study provide important hints for the correlation between psychological stress and bone metabolism.

      We thank this reviewer for the comprehensive summary and positive comment on our work.

      Weakness:

      The data provided by the authors are preliminary, especially the mechanistic insight, which needs to be enhanced. The authors have shown that miR-335-3p expression was altered in the CUMS mouse model and the change of its expression regulated osteoclast activity. The validation should be conducted in vivo, and the mechanism behind this should be investigated further.

      We thank the reviewer’s important insight on the need for further in vivo validation of the role of miR-335-3p. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist). Then, we injected them into the body through the tail vein for about 2 months and observed the bone phenotype in each group of mice. The results suggested that the decrease of miR-335-3p in vivo could lead to bone loss, which was consistent with our in vitro validation results (Figure 5H-I).

      Reviewing Editor:

      Method

      (1) Bone histomorphometric analysis following ASBMR's guidelines Bone histomorphometric analysis of bone formation and bone resorption: The authors should follow ASBMR's guidelines for bone histomorphometry (PMCID: PMC3672237 and PMID: 3455637) to perform standard analyses of histomorphometry, rather than selected areas. They should also clearly describe a software used and define the areas analyzed.

      We carefully re-analyzed bone histomorphometry according to ASBMR guidelines and combine this with our own understanding. At the same time, we improved the description of micro-CT and histological analysis in the method. If there is still any lack of standardization, we would be grateful for any constructive suggestions to improve this.

      (2) Osteoclast cultures require nuclear staining to demonstrate multinucleated Trap positive cells.

      We used the RAW264.7, a mouse macrophage-like cell line, for in vitro culture and induced its differentiation towards osteoclasts. Successfully induced osteoclasts showed enlarged cytoplasm and multinucleated fusion. Tartrate-resistant acid phosphatase (Trap) is the signature enzyme of osteoclasts. It can bind to the chromogen to exhibit a mauve color, based on the principle of azo-coupled immunohistochemistry. At the same time, small and rounded nuclei fused show a lighter color (author response image 1, yellow arrows). We attempted to stain the nuclei with hematoxylin based on this. However, it was unable to further distinguish the contours of the nuclei clearly due to the similar color to the Trap positive signals. Besides, many other scholars have assessed osteoclast activity in vitro experiments based solely on the results of Trap staining (area and number) (Cheng et al., 2022; Li et al., 2019; Ma et al., 2021; Zhong et al., 2023). Nevertheless, in the immunofluorescence staining of osteoclasts, the nuclei were labeled using a Hochest antibody to reflect the multinucleated fusion of osteoclasts (Figure 5G).  

      (3) Osteoclast pit assays should be carried out to necessarily demonstrate the change of osteoclast resorption ability caused by miR-335-3p.

      We added osteoclast pit assays to validate the role of miR-335-3p on osteoclast resorptive capacity (Figure 5D-E).

      (4) Serum ELISA assay should be done to examine the global change of bone remodeling in the CUMS mice to assess bone formation and bone resorption that will support their claim.

      We performed additional tests on serum concentrations of R-hydroxy glutamic acid protein (BGP), TRAP, Cathepsin K (CTSK), parathyroid hormone (PTH), calcium (CA), phosphate (P) in control and CUMS mice, which could better reflect the global change of bone remodeling in the CUMS mice (Figure 3— figure supplement 1).

      (5) miR-RNA-seq: A labeled volcano plot should be used to replace the present one to show significant changes in differential gene expression.

      We appreciate this great suggestion. We replaced the volcano plot that showed significant changes in differential gene expression (Figure 4B). We also uploaded the raw data to the GEO database (GSE253504), making the results clearer and more accessible.

      Discussion

      The authors should discuss previous works on the influences of hormones from the brain on chronic stress-induced bone loss and an association of these influences with their findings.

      The discussion on the relationship between the bone metabolism regulation of both hormones and miR-335-3p in psychological stress was added in the second and fifth paragraphs of the discussion. To conclude, on the one hand, brain-derived and blood-transported miR-335-3p regulate bone metabolism synergistically. On the other hand, it exerted a more direct influence on bone under psychological stress.

      Language

      The language of the MS should be improved.

      The manuscript has been carefully edited by a professional proofreader.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1F: The exact meaning of the Waveform Graph shown at left needs to be clarified for the not-so-experienced reader.

      We added the more detailed meaning of the Waveform Graph in figure legends (Figure legend 1F).

      (2) Is the concomitant increase in osteogenic and osteoblastic activity in this study consistent with that seen in similar disease studies? This could be added to the discussion.

      In the fifth paragraph of the discussion section, we present the alterations of osteogenic and osteoblastic activity observed in other studies that are similar to ours. We also had a detailed discussion based on these observations.

      (3) Figure 6A: Please highlight the key information to visualize the potential linkage among miR-335-3p, Fos, and osteoclast.

      We highlighted the crucial linkage among miR-335-3p, Fos, and osteoclast with red arrows (Figure 6A)

      4) Figure 6E: The specific area of the selected comparison needs to be clarified. Please add white dotted lines and lettering T (trabecular bone) and GP (growth plate) for the not-so-experienced reader. This will provide some orientation.

      We used white dotted lines as well as letters to label the tissue in immunofluorescence staining images (Figure 6E).

      (5) Line 350: "NAC derived and blood-trans, Ported miR-335-3p". There is a grammatical error. Please conduct general proofreading of the text and writing style.

      Thank you for pointing this out. We have corrected this grammatical error, and we also checked the full text to correct similar errors.

      Reviewer #2 (Recommendations For The Authors):

      (1) miR-335-3p was downregulated in the femur in the CUMS mice. The possible mechanism for this outcome should be further discussed. In Figure 4B, the Volcano plot showed that only a few miRNA were differentially expressed between the control and CUMS mice. How do the authors explain this?

      The chronic unpredictable mild stress (CUMS) model was constructed using normal mice. As the name of the model suggests, the stimulus is mild and does not cause developmental damage or teratogenic effects in mice. Conversely, CUMS has the potential to result in the chronic pathological conditions. Besides, in miRNA sequencing results from other tissues with similar models to ours, the number of differential miRNAs is also around a few dozen (Ma et al., 2019).

      (2) The authors have demonstrated that miR-335-3p inhibits osteoclast differentiation based on an in vitro assay in Figure 5; however, an in vivo experiment is required to provide more solid evidence.

      We strongly agree that in vivo experimental validation would bring more convincing results to this study. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist), which were injected into mice via the tail vein every five days. Samples were collected at one and two months following the injection. We found that sustained two-month injections of antagomir could significantly lead to bone loss in mice (Figure 5H-I), which is consistent with our in vitro validation results.

      However, the Agomir-miR-335-3p group did not exhibit a notable enhancement of bone mass. This may be attributed to the fact that the 11-week-old normal mice selected for this study were in their prime and did not have strong osteoclastic activity in vivo. Therefore, the osteoclastic inhibition of Agomir-335-3p could not be demonstrated.

      In addition, no significant difference was seen one month after the injection. The main reason may be that the time is too short. On the one hand, the drug we injected was RNA preparation. They lacked stability resulting in poor delivery efficiency, which took some time to take effect. On the other hand, bone remodeling is also a time-consuming process.

      (3) FOS and NFATC1 should be expressed in the nuclei of the cells, therefore, the quality of the images needs to be improved.

      NFATC1 is a T-cell-activating nuclear factor that is activated in the nucleus to regulate the transcription of a variety of osteoclast-related genes, including ACP5, MMP9, etc. FOS could bind and interact with NFATC1, resulting in nuclear translocation and transcription activated. This could promote the differentiation and maturation of osteoclasts. They are both synthesized and processed in the cytoplasm and eventually enter the nucleus to perform their functions. Therefore, they are expressed in both the nucleus and the cytoplasm (Deng et al., 2022; Hounoki et al., 2008; Li et al., 2022).

      In Figure 5G, we labeled cell nuclei with HOCHEST antibody with blue fluorescence, and more co-localized signals of nuclei (blue), FOS (red), and NFATC1 (green) were seen in the Inhibitor-miR-335-3p group, whereas the opposite result was observed in the Mimic-miR-335-3p group. These results indicated that inhibited miR-335-3p could promote osteoclast differentiation in vitro.

      (4) The expression of FOS was elevated in CUMS group in Figure 6E; however, its mRNA level was unchanged, as shown in Figure 6 supplement; what is the explanation for this? How do the authors claim FOS is the downstream target if its mRNA expression is not impacted by CUMS?

      The results demonstrated that miR-335-3p targeted binding to the mRNA of Fos did not result in mRNA degradation. Instead, this binding interferes with the protein translation process, which ultimately leads to the reduction of FOS protein.

      (5) What would be the bone phenotype if a FOS inhibitor was injected into the control and CUMS mice? It is important to examine FOS function through an in vivo context.

      The regulatory role of FOS for osteoclasts has been validated in numerous articles, both in vivo and in vitro(Aikawa et al., 2008; Cao et al., 2023; Cheng et al., 2022). For example, Aikawa et al. designed a small-molecule inhibitor of c-Fos/activator protein-1 (AP-1) using three-dimensional (3D) pharmacophore modeling, which helped verify the effect of FOS on osteoclasts in vivo(Aikawa et al., 2008).

      We also strongly agree that in vivo injection of inhibitors of FOS, especially in CUMS mice, could further substantiate the role of miR-335-3p in osteoclasts under psychological stress. However, the study was constrained by the unavailability of commercially viable, efficacious small molecule inhibitors of FOS. In the future, we plan to design more precise therapeutic targets for psychological stress induced osteoporosis based on existing research ideas.

      Reference

      Aikawa, Y., Morimoto, K., Yamamoto, T., Chaki, H., Hashiramoto, A., Narita, H., Hirono, S., & Shiozawa, S. (2008). Treatment of arthritis with a selective inhibitor of c-Fos/activator protein-1. Nature Biotechnology, 26(7), 817-823. https://doi.org/10.1038/nbt1412

      Cao, Z., Niu, X. B., Wang, M. H., Yu, S. W., Wang, M. K., Mu, S. L., Liu, C., & Wang, Y. X. (2023, Nov). Anemoside B4 attenuates RANKL-induced osteoclastogenesis by upregulating Nrf2 and dampens ovariectomy-induced bone loss [Article]. Biomedicine & Pharmacotherapy, 167, 12, Article 115454. https://doi.org/10.1016/j.biopha.2023.115454

      Cheng, X., Yin, C., Deng, Y., & Li, Z. (2022). Exogenous adenosine activates A2A adenosine receptor to inhibit RANKL-induced osteoclastogenesis via AP-1 pathway to facilitate bone repair. Molecular Biology Reports, 49(3), 2003-2014. https://doi.org/10.1007/s11033-021-07017-1

      Deng, W., Ding, Z., Wang, Y., Zou, B., Zheng, J., Tan, Y., Yang, Q., Ke, M., Chen, Y., Wang, S., & Li, X. (2022). Dendrobine attenuates osteoclast differentiation through modulating ROS/NFATc1/ MMP9 pathway and prevents inflammatory bone destruction. Phytomedicine : International Journal of Phytotherapy and Phytopharmacology, 96, 153838. https://doi.org/10.1016/j.phymed.2021.153838

      Hounoki, H., Sugiyama, E., Mohamed, S. G.-K., Shinoda, K., Taki, H., Abdel-Aziz, H. O., Maruyama, M., Kobayashi, M., & Miyahara, T. (2008). Activation of peroxisome proliferator-activated receptor gamma inhibits TNF-alpha-mediated osteoclast differentiation in human peripheral monocytes in part via suppression of monocyte chemoattractant protein-1 expression. Bone, 42(4), 765-774. https://doi.org/10.1016/j.bone.2007.11.016

      Li, Y., Yang, C., Jia, K., Wang, J., Wang, J., Ming, R., Xu, T., Su, X., Jing, Y., Miao, Y., Liu, C., & Lin, N. (2022). Fengshi Qutong capsule ameliorates bone destruction of experimental rheumatoid arthritis by inhibiting osteoclastogenesis. Journal of Ethnopharmacology, 282, 114602. https://doi.org/10.1016/j.jep.2021.114602

      Li, Z., Huang, J., Wang, F., Li, W., Wu, X., Zhao, C., Zhao, J., Wei, H., Wu, Z., Qian, M., Sun, P., He, L., Jin, Y., Tang, J., Qiu, W., Siwko, S., Liu, M., Luo, J., & Xiao, J. (2019). Dual Targeting of Bile Acid Receptor-1 (TGR5) and Farnesoid X Receptor (FXR) Prevents Estrogen-Dependent Bone Loss in Mice. Journal of Bone and Mineral Research : the Official Journal of the American Society For Bone and Mineral Research, 34(4), 765-776. https://doi.org/10.1002/jbmr.3652

      Ma, K., Zhang, H., Wei, G., Dong, Z., Zhao, H., Han, X., Song, X., Zhang, H., Zong, X., Baloch, Z., & Wang, S. (2019). Identification of key genes, pathways, and miRNA/mRNA regulatory networks of CUMS-induced depression in nucleus accumbens by integrated bioinformatics analysis. Neuropsychiatric Disease and Treatment, 15, 685-700. https://doi.org/10.2147/NDT.S200264

      Ma, Q., Liang, M., Wu, Y., Luo, F., Ma, Z., Dong, S., Xu, J., & Dou, C. (2021). Osteoclast-derived apoptotic bodies couple bone resorption and formation in bone remodeling. Bone Research, 9(1), 5. https://doi.org/10.1038/s41413-020-00121-1

      Zhong, L., Lu, J., Fang, J., Yao, L., Yu, W., Gui, T., Duffy, M., Holdreith, N., Bautista, C. A., Huang, X., Bandyopadhyay, S., Tan, K., Chen, C., Choi, Y., Jiang, J. X., Yang, S., Tong, W., Dyment, N., & Qin, L. (2023). Csf1 from marrow adipogenic precursors is required for osteoclast formation and hematopoiesis in bone. eLife, 12. https://doi.org/10.7554/eLife.82112

    1. eLife Assessment

      This study presents an advance in efforts to use histone post-translational modification (PTM) data to model gene expression and predict epigenetic editing activity. Such models are broadly useful to the research community, especially ones that can model epigenetic editing activity, which is novel; additionally, the authors have nicely integrated datasets across cell types into their model. The work is mostly solid, but it would be strengthened by performing rigorous comparisons to existing methods that predict gene expression from PTM data and from additional model validation beyond dCas9-p300 based perturbations.

    2. Reviewer #1 (Public review):

      Batra, Cabrera and Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) if it helps us to better understand the biology of gene expression or d) it helps us to understand epigenome editing activity. Problematically for point a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      Other approaches have been published that use histone PTM to predict expression (e.g. PMID 27587684, 36588793). Is this model better in some way? No comparisons are made although a claim is made that direct comparisons are difficult. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. Approaches that predict expression levels are much more useful whereas some previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking. The paper does not seem to have substantial novel insights into understanding the biology of gene expression.

      The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel although only examined in the context of a p300 editor. As the author point out the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      Furthermore from the model evaluation of H3K9me3 is seems the model is performing modestly for other forms of epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517).

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA independent off target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications, and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 13 gene promoters in two cell types, and measure gene expression changes to test their model.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. They use dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations of the HEK293 data showed some support for the predictions after perturbation of H3K27ac.

      Weaknesses:

      The perturbation of 5 genes in K562 with perturb-seq data shows a modest correlation of ~0.5 and isn't included in the main figures. The authors are then left to speculate reasons why the outcome of epigenome editing doesn't fit their predictions, which highlights the limited value in the current version of this method.

      As mentioned before, testing genes that were not expressed being most activated by dCas9-p300 weaken the correlations vs. looking at a broad range of different gene expression as the original model was trained on.<br /> If the authors want this method to be used to predict outcomes of epigenome editing, expanding to dCas9-KRAB and other CRISPRa methods (SAM and VPR) would be useful. Those datasets are published and could be analyzed for this manuscript.<br /> The authors don't compare their method to other prediction methods.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Batra, Cabrera, Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) it helps us to better understand the biology of gene expression, or d) it helps us to understand epigenome editing activity. Problematically for points a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      We thank the reviewer for their comment and we agree that directly measuring gene expression (e.g., by performing RNA-seq) is easier than performing multiple PTMs in a new cell line. We designed our approach keeping in mind that the primary use case is to understand how epigenome editing would affect gene expression.

      Other approaches have been published that use histone PTM to predict expression (e.g. 27587684, 36588793). Is this model better in some way? No comparisons are made. The paper does not seem to have substantial novel insights into understanding the biology of gene expression. The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel but I doubt given the variability of the predictions (Figures 6 and S7&8) that many people will be interested in using this in a practical sense. As the authors point out, the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      We thank the reviewer for this insightful comment. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      We outline in the Discussion section that by creating a comprehensive dataset of epigenome editing outcomes, which include quantification of histone PTMs before and after in situ perturbations, will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. 

      Furthermore from the model evaluation of H3K9me3 it seems the model is not performing well for epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517). However, it seems from Figures 2&4 that the model wouldn't be able to evaluate or predict this.

      We thank the reviewer for their comment. We have included a supplementary figure, Figure 4 – figure supplement 1, that quantifies how sensitive the trained gene expression model is to perturbations in H3K9me3. Indeed our data suggests that the model predictions are sensitive to perturbations in H3K9me3. For instance, there is a clear decrease and a gradual increase as the position where the perturbation is performed moves from upstream to downstream of the TSS. Additionally, the magnitude of the predicted fold-change is a function of how much the H3K9me3 is perturbed and hence the magnitude of change would be even higher if the perturbation magnitude is increased. However, this precise magnitude is hard to estimate In the absence of experimental perturbation data for H3K9me3.

      The model seems to predict gene expression for endogenous genes quite well although the authors sometimes use expression and sometimes use rank (e.g. Figure 6) - being clearer with how the model predicts expression rather than using rank or fold change would be very useful.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA-independent off-target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

      This is an excellent point and indeed, we and others have observed that dCas9-p300 can result in off-target H3K27ac levels (both increased and suppressed) across the genome. However, p300 is one of the few known proteins that can catalyze H3K27ac in the human genome, and H3K27ac remains a proxy for active genomic regulatory elements. Nevertheless, dCas9-p300 off target activity could certainly convolute our approach. We have included language to address this caveat in our discussion. Interestingly, even though dCas9-p300 (and other epigenome editing enzymes) can lead to off-target chromatin modifications, these effects often occur without coincident disruptions to the transcriptome. This suggests that many chromatin modifications, while “supportive” or “instructive” of/for transcription, may be insufficient (either alone or in the context of dCas9-based fusions) for transcriptional effects.

      Figure 2

      It seems this figure presents known rather than novel findings from the authors' description. Please comment on whether there are any new findings in this figure. Please comment on differences in patterns of repressive and activating histone PTMs between cell lines (e.g. H1-Esc H3K27me3 green 25-50% is more enriched than red 0-25%).

      Thank you for pointing out this issue. We have revised the text in both the Results and Discussion sections to better articulate that the goal of this figure is to validate the hypothesis that there are consistent patterns of histone PTMs with respect to gene expression across different human cell types.

      In Figure 2, which illustrates the raw histone marks data, the non-monotonic behavior of H3K27me3 in H1-hESC cells is indicative of a real biological phenomenon. This interpretation is supported by the relatively low Pearson correlation for the H3K27me3 mark observed in these cells, as documented in Figure 1b of another study: https://www.biorxiv.org/content/10.1101/2024.03.29.587323v1.

      Figure 3&4

      There are a number of approaches including DeepChrome and TransferChrome that predict endogenous gene expression from histone PTMs. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. But from what is presented it isn't clear that the author's model is better or enabling beyond other approaches. The authors should show their model is better than other approaches or make clear why this is a significant advance that will be enabling for the field. For example is it that in this approach they are actually predicting expression levels whereas previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking?

      We thank the reviewer for this comment. We have added text to clarify the difference between our approach and existing approaches. There are two key differences between our model and other approaches. First, the gene expression model that we have trained here predicts gene expression values instead of gene expression levels as either high or low. Second, we have trained our models on ENCODE p-value data instead of read depths obtained from the Roadmap Epigenomics Consortium.

      Figure 5

      From the methods, it seems gene activation is measured by qpcr in hek293 transfected with individual sgRNAs and dCas9-p300. The cells aren't selected or sorted before qPCR so how are we sure that some of the variability isn't due to transfection efficiency associated with variable DNA quality or with variable transfection efficiency?

      This is a good question. All DNA preps were generated using high-quality reagents and consistent protocols. In addition, the only variable that changed with respect to transfection efficiency was the gRNA-encoding vector used in qPCR assays. We have added new data which demonstrates that transfection efficiency is shared across experiments (Figure 5 – figure supplement 1). We have also added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript (Figure 6 – figure supplement 1), which use lentiviral transduction and RNA-seq as readouts and thus, are buffered against the variances mentioned by the Reviewer.

      Figure 6

      The use of rank in 6D and 6E is confusing. In 6D a higher rank is associated with higher expression while in 6E a higher rank seems to mean a lower fold change e.g. CYP17A1 has a low predicted fold-change rank and qPCR fold-change rank but in Figure 5 a very high qPCR fold change. Labeling this more clearly or explaining it in the text further would be useful.

      We thank the reviewer for their suggestion. We have made relevant changes to the caption of Figure 6 to clarify this.

      Reviewer #2 (Public Review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 8 gene promoters, and measure gene expression changes to test their model.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. This group also utilized a tool they are experts in, dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations showed some support for the predictions after the perturbation of H3K27ac.

      Weaknesses:

      The perturbation of only 8 genes, and the only readout being qPCR-based gene expression, as opposed to including H3K27ac, weakened their validation of the computational model. Likewise, the use of six genes that were not expressed being most activated by dCas9-p300 might weaken the correlations vs. looking at a broad range of different gene expressions as the original model was trained on.

      We thank the reviewer for their comments. We have added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript. We observe that the models we have developed are able to predict the fold-change rank across genes reasonably well (Figure 6 – figure supplement 1), similar to what we observe in Figure 6E.

      Reviewer #1 (Recommendations For The Authors):

      The authors should comment on how their model is different from or better than other models that use histone PTM data to predict gene expression.

      We thank the reviewer for this insightful suggestion. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      The authors need to make clear whether their model will apply to other common epigenetic or transcriptional editors such as CRISPRi/H3K9me3 which is widely used.

      In this study, we focus on the histone changes induced by p300. However, future studies may use the framework described in our manuscript and apply it to other transcriptional editors as well.

      The authors need to be clearer about where they are predicting expression and where they are using rank. Ideally, show both.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      The authors should ideally show a case where they use the model to make a prediction of genes that can and can not be activated by dCas9-p300 or other epigenetic editors and then prove this with experiments.

      Thank you for the excellent suggestion. While it is indeed relevant, exploring this would extend beyond the scope of our current study. We consider it a valuable topic for future research.

      Reviewer #2 (Recommendations For The Authors):

      The y-axis in 5C needs to be labeled. The authors state it is "relative mRNA" but these numbers correlated with fold changes shown in Table S2.

      We have clarified the definition of the Y-axis in the caption for Figure 5C.

    1. eLife Assessment

      This valuable study presents a meta-analysis of the literature, confirming the relationship between the coupling of slow oscillations and fast spindles in memory formation, although the reported effects are weak and should be more clearly justified. Furthermore, while the evidence is convincing overall, the manuscript provides an incomplete description of the methods, which may challenge comprehension for readers unfamiliar with advanced statistical techniques. This study will be of interest to neuroscientists focusing on sleep and memory.

    2. Reviewer #1 (Public review):

      In this meta-analysis, Ng and colleagues review the association between slow-oscillation spindle coupling during sleep and overnight memory consolidation. The coupling of these oscillations (and also hippocampal sharp-wave ripples) have been central to theories and mechanistic models of active systems consolidation, that posit that the coupling between ripples, spindles, and slow oscillations (SOs) coordinate and drive the coordinated reactivation of memories in hippocampus and cortex, facilitating cross-regional information and ultimately memory strengthening and stabilisation.

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data. The results show that the timing of sleep spindles relative to the SO phase, and the consistency of that timing, predicted overnight memory consolidation in meta-analytic models. The overall amount of coupling events did not show as strong a relationship. The coupling phase in particular was moderated by a number of variables including spindle type (fast, slow), channel location (frontal, central, posterior), age, and memory type. The main takeaway is that fast spindles that consistently couple close to the peak of the SO in frontal channel locations are optimal for memory consolidation, in line with theoretical predictions.

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

    3. Reviewer #2 (Public review):

      Summary:

      This article reviews the studies on the relationship between slow oscillation (SO)-spindle (SP) coupling and memory consolidation. It innovatively employs non-normal circular linear correlations through a Bayesian meta-analysis. A systematic analysis of the retrieved studies highlighted that co-coupling of SO and the fast SP's phase and amplitude at the frontal part better predicts memory consolidation performance. I only have a few comments that I recommend are addressed.

      Major Comments:

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

    4. Reviewer #3 (Public review):

      This manuscript presents a meta-analysis of 23 studies, which report 297 effect sizes, on the effect of SO-spindle coupling on memory performance. The analysis has been done with great care, and the results are described in great detail. In particular, there are separate analyses for coupling phase, spindle amplitude, coupling strength (e.g., measured by vector length or modulation index), and coupling percentage (i.e., the percentage of SPs coupled with SOs). The authors conclude that the precision and strength of coupling showed significant correlations with memory retention.

      There are two main points where I do not agree with the authors.

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10<br /> Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent. This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information). However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

    5. Author response:

      Reviewer #1 (Public review):

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As standardization this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we will discuss this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how spindle amplitude was related to coupling– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as a key moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, enabling future meta-analyses to incorporate these measures comprehensively. We will add this discussion to the manuscript in the revised version to further clarify these points.

      References:

      Roebber, J. K., Lewis, P. A., Crunelli, V., Navarrete, M. & Hamandi, K. Effects of anti-seizure medication on sleep spindles and slow waves in drug-resistant epilepsy. Brain Sci. 12, 1288 (2022). https://doi.org/10.3390/brainsci12101288

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we will revise the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references. We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the traveling nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). We believe a better understanding of coupling in the context of the movement of these waves will help us better understand the observed frontal relationship with consolidation. We will address this in our revised manuscript.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and will add more information in section 3.5 to advocate for a standardized “template” used to analyze and report effect size in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, consistency, and occurrence. Each coupling metric captures distinct properties of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We will report these additional results in the revised manuscript, and interpret “the moderator effect of age becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10)

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or in a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with certain subgroups demonstrating much stronger and more meaningful effects, especially after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We will add more discussion about the influence of moderators on the dynamics of coupling-memory associations. In addition, we will update the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation”.

      Reference:

      Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168 (2019). https://doi.org/10.1177/2515245919847202.

      Bakker, A. et al. Beyond small, medium, or large: Points of consideration when interpreting effect sizes. Educ. Stud. Math. 102, 1–8 (2019). https://doi.org/10.1007/s10649-019-09908-4

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we will include a sub-section in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript.

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We will include clearer references in the next version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      References:

      Hackenberger, B.K. Bayesian meta-analysis now—let's do it. Croat. Med. J. 61, 564–568 (2020). https://doi.org/10.3325/cmj.2020.61.564

      Sutton, A.J. & Abrams, K.R. Bayesian methods in meta-analysis and evidence synthesis. Stat. Methods Med. Res. 10, 277–303 (2001). https://doi.org/10.1177/096228020101000404

      Williams, D.R., Rast, P. & Bürkner, P.C. Bayesian meta-analysis with weakly informative prior distributions. PsyArXiv (2018). https://doi.org/10.31234/osf.io/9n4zp

      van de Schoot, R., Depaoli, S., King, R. et al. Bayesian statistics and modelling. Nat Rev Methods Primers 1, 1 (2021). https://doi.org/10.1038/s43586-020-00001-2

      Smith, T.C., Spiegelhalter, D.J. & Thomas, A. Bayesian approaches to random-effects meta-analysis: a comparative study. Stat. Med. 14, 2685–2699 (1995). https://doi.org/10.1002/sim.4780142408

      Kruschke, J.K. & Liddell, T.M. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 25, 178–206 (2018). https://doi.org/10.3758/s13423-016-1221-4

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. We have provided exemplary plots in the supplemental material and will add more details to explain the interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true Z_r”. The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) Z_r correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true Z_r and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like Z_r = 0.65 is unlikely. We loop through simulations to generate population data and ensure their Z_r values fall within a threshold. For moderate effect sizes (e.g., Z_r = 0.35), this is straightforward using a narrow range (0.345 < Z_r < 0.355). However, for larger effect sizes like Z_r = 0.65, a wider range (0.6 < Z_r < 0.7) is required. therefore sometimes the population we used to draw the sample has a Z_r slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that large Z_r still has a normal sampling distribution, but not focus specifically on achieving Z_r = 0.65.

      We acknowledge that this variability of the range used was not clearly explained and it is not accurate to report “true Z_r = 0.65”. In the revised version, we will address this issue by adding vertical lines to each subplot to indicate the Z_r of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we will revise the title to “Sampling distributions of Z_r drawn from strong correlations (Z_r = 0.6-0.7)”. We confirmed that population Z_r and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming Z_r = -1 represents the null hypothesis is not accurate. The circlin Z_r = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population with the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we will update Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r = 0.5).

    1. eLife Assessment

      This manuscript describes a fundamental investigation of the functioning of Cas9 and in particular on how variant xCas9 expands DNA targeting ability by an increase-flexibility mechanism. The authors provide compelling evidence to support their mechanistic models and the relevance of flexibility and entropy in recognition. This work can be of interest to a broad community of structural biophysicists, computational biologists, chemists, and biochemists.

    2. Joint Public Review:

      Summary:

      Hossain and coworkers investigate the mechanisms of recognition of xCas9, a variant of Cas9 with expanded targeting capability for DNA. They do so by using molecular simulations and combining different flavors of simulation techniques, ranging from long classical MD simulations, to enhanced sampling, to free energy calculations of affinity differences. Through this, the authors are able to develop a consistent model of expanded recognition based on the enhanced flexibility of the protein receptor.

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

    1. eLife Assessment

      This study reports valuable findings that highlight the importance of data quality and data representation for ligand-based virtual screening experiments. The authors' claims are supported by solid evidence, although the conclusions have been inferred from only two datasets. The work would gain much impact if additional datasets were used. The main findings will be of interest to cheminformaticians and medicinal chemists working in QSAR modeling, and possibly in other areas related to machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      The work provides more evidence of the importance of data quality and representation for ligand-based virtual screening approaches. The authors have applied different machine learning (ML) algorithms and data representation using a new dataset of BRAF ligands. First, the authors evaluate the ML algorithms, and demonstrate that independently of the ML algorithm, predictive and robust models can be obtained in this BRAF dataset. Second, the authors investigate how the molecular representations can modify the prediction of the ML algorithm. They found that in this highly curated dataset the different molecule representations are adequate for the ML algorithms since almost all of them obtain high accuracy values, with Estate fingerprints obtaining the worst performing predictive models and ECFP6 fingerprints producing the best classificatory models. Third, the authors evaluate the performance of the models on subsets of different composition and size of the BRAF dataset. They found that given a finite number of active compounds, increasing the number of inactive compounds worsens the recall and accuracy. Finally, the authors analyze if the use of "less active" molecules affect the model's predictive performance using "less active" molecules taken from ChEMBl Database or using decoys from DUD-E. As results, they found that the accuracy of the model falls as the number of "less active" examples in the training dataset increases while the implementation of decoys in the training set generates results as good as the original models or even better in some cases. However, the use of decoys in the training set worsens the predictive power in the test sets that contain active and inactive molecules.

      Strengths:

      This is a highly relevant topic in medicinal chemistry and drug discovery. The manuscript is well-written, with a clear structure that facilitates easy reading, and it includes up-to-date references. The hypotheses are clearly presented and appropriately explored. The study provides valuable insights into the importance of deriving models from high-quality data, demonstrating that, when this condition is met, complex computational methods are not always necessary to achieve predictive models. Furthermore, the generated BRAF dataset offers a valuable resource for medicinal chemists working in ligand-based virtual screening.

      Weaknesses:

      While the work highlights the importance of using high-quality datasets to achieve better and more generalizable results, it does not present significant novelty, as the analysis of training data has been extensively studied in chemoinformatics and medicinal chemistry. Additionally, the inclusion of "AI" in the context of data-centric AI is somewhat unclear, given that the dataset curation is conducted manually, selecting active compounds based on IC50 values from ChEMBL and inactive compounds according to the authors' criteria.

      Moreover, the conclusions are based on the analysis of only two high-quality datasets. To generalize these findings, it would be beneficial to extend the analysis to additional high-quality datasets (at least 10 datasets for a robust benchmarking exercise).

      A key aspect that could be improved is the definition of an "inactive" compound, which remains unclear. In the manuscript, it is stated:

      • "The inactives were carefully selected based on the fact that they have no known pharmacological activity against BRAF."<br /> Does the lack of BRAF activity data necessarily imply that these compounds are inactive?<br /> • "We define a compound as 'inactive' if there are no known pharmacological assays for the said compound on our target, BRAF."<br /> However, in the authors' response, they mention:<br /> • "We selected certain compounds that we felt could not possibly be active against BRAF, such as ligands for neurotransmitter receptors, as inactives."

      Given that the definition of "inactive" is one of the most critical concepts in the study, I believe it should be clearly and consistently explained.

      Lastly, while statistical comparison is not always common in machine learning, it would greatly enhance the value of this work, especially when comparing models with small differences in accuracy.

    3. Reviewer #2 (Public review):

      Summary:

      The authors explored the importance of data quality and representation for ligand-based virtual screening approaches. I believe the results could be of potential benefit to the drug discovery community, especially to those scientists working in the field of machine learning applied to drug research. The in silico design is comprehensive and adequate for the proposed comparisons.

      This manuscript by Chong A. et al describes that it is not necessary to resort to the use of sophisticated deep learning algorithms for virtual screening, since based on their results considering conventional ML may perform exceptionally well if feeded by the right data and molecular representations.

      The article is interesting and well-written. The overview of the field and the warning about dataset composition are very well thought-out and should be of interest to a broad segment of the AI in drug discovery readership. This article further highlights some of the considerations that need to be taken into consideration for the implementation of data-centric AI for computer-aided drug design methods.

      Strengths:

      This study contributes significantly to the field of machine learning and data curation in drug discovery. The paper is, in general, well-written and structured. However, in my opinion, there are some suggestions regarding certain aspects of the data analyses.

      Weaknesses:

      The conclusions drawn in the study are based on the analysis of a two dataset. The authors chose BRAF as an example in this study, and expanded with BACE-1 dataset; however a benchmark with several targets would be suitable to evaluate reproducibility or transferability of the method. One concern could be the applicability of the method in other targets.

    4. Reviewer #3 (Public review):

      Summary:

      The authors presented a data-centric ML approach for virtual ligand screening. They used BRAF as an example to demonstrate the predictive power of their approach.

      Strengths:

      The performance of predictive models in this study is superior (nearly perfect) with respect to exiting methods.

      Comments on revisions:

      In the revised manuscript, the presented approach has been robustly tested and can be very useful for ligand prediction.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the Editors and reviewers for their candid evaluation of our work. While it was suggested that we should demonstrate the validity of our approach with maybe 10 different datasets but we felt that this would place an undue burden on our resources. Generally, it takes about 4 to 6 months for us to build a dataset and this does not include the time taken to train and test our AI models. This would mean that it would take us another 3 to 5 years to complete this research project if we chose to provide 10 different datasets. Publishing a research on one dataset is definitely not unheard of: for example, Subramanian et al. (2016) published their widely-cited benchmark dataset for just BACE1 inhibitors. However, we hoped that the additional work where we showed that we were able to improve the benchmark dataset for BACE1 inhibitors and achieve the same high level of predictive performance for this dataset would convince the readers (and reviewers) of the reproducibility of our approach. Furthermore, we also showed that our approach is robust and does not rely on a large volume of data to achieve this near-perfect accuracy. As can be seen in the Supplemental section, even our AI models trained on ONLY 250 BRAF actives and 250 inactives could achieve 96.3% accuracy! Logically, if the model is robust then we would expect the model to be reproducible. As such, we do not feel it is necessary for us to test our approach on 10 different datasets. 

      It was also suggested that we expand this study to other types of molecular representations to give a better idea of generalizability. We would like to point out that we tested, in total, 55 single fingerprints and paired combinations. Our goal was to create an approach that could give superior performance for virtual screening and we believe that we have achieved this. Based on the results of our study, we are of the opinion that molecular representations do not, in general, have an oversized effect on AI virtual screening. Although it is important to be aware that certain molecular representations may give SLIGHTLY better performance but we can see that with the exception of the 79-bit E-State fingerprint (which could still achieve an impressive 85% accuracy for the SVM model), nearly all molecular fingerprints and paired combinations that we used were able to achieve an accuracy of above 97%. Therefore, we do not share the reviewers' concern that our approach may not be useful when applied with other types of molecular representations.

      It is true that our work involved manual curation of the datasets but the goal of this paper is to lay down some  ground rules for the future development of a data-centric AI approach. Although manual curation is a routine practice in AI/ML, but it should be recognised that there is good manual curation and bad manual curation, and rules need to be established to ensure we have good manual curation. Without these rules, we would also not be able to establish and train a data-centric AI. All manual curation involves a level of subjectiveness but that subjectiveness comes from one's experience and domain knowledge of the field in which the AI is being applied. For example, in the case of this study, we relied on our knowledge and understanding of pharmacology to determine whether a compound is pharmacologically inactive or active. This may seem somewhat arbitrary to the uninitiated but it is anything but arbitrary. It is through careful thought and assessment of the chemical compounds that we choose these compounds for training the AI. Unfortunately, this sort of subjective assessment cannot be easily or completely explained but we do show where current practices have failed when building a dataset for training an AI for virtual screening.

    1. eLife Assessment

      This important study used an automated system to collect eggs laid over the course of multiple days by individual female Drosophila to successfully reveal a robust yet noisy circadian rhythm of egg-laying. Their results show that the neural control of this rhythm is entirely different from the one that controls locomotor activity rhythmicity. Preliminary connectome-based analyses provide evidence for connections between the relevant clock neurons and neurons involved in oviposition. The evidence provided is solid, although using an independent tool for targeted knockdown of clock genes and including the time series of representative individuals for all genotypes tested would help interpret the results.

    2. Joint Public Review:

      Riva et al uncovered the neural substrate underlying the oviposition rhythm in Drosophila melanogaster using a novel device that automates egg collection from individual mated females over the course of multiple days. By systematically knocking down the clock gene period in specific clock neurons the authors show that three cryptochrome (cry) positive dorso-lateral neurons (LNds) present in each hemisphere of the fly brain are critical to generating a female, sex-specific rhythm in oviposition. Interestingly, these neurons are not essential for freerunning locomotor activity. By contrast, the LNvs (lateral ventral neurons), which are essential for freerunning locomotor activity rhythmicity, were not involved in controlling the circadian rhythmicity of oviposition. Thus, this work has identified the first truly sex-specific circadian circuit in Drosophila. Using available Drosophila hemibrain connectome data they identify bidirectional connections between cry-expressing LNd and oviposition-related neurons.

      Strengths:

      This paper established a new semi-automatic device to register egg-laying activity, in Drosophila and found a specific role for a subset of clock neurons in the control of a female-specific circadian behavior. They also lay the groundwork for understanding how these neurons are connected to the neurons that control egg laying.

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

    3. Author response:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      We agree with this objection, and in the corrected version we plan to provide the assessment of the egg laying rhythms for the missing GAL4 controls as recommended only for Figure 3.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have recently acquired mutant flies with a dominant negative-cycle transgene (UAS-cycDN, Tanoue et al. 2004), and we plan to repeat our experiments with these mutants, in order to confirm our results.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      In the revised version we will show that the detrending approach used does not introduce any artefacts. The analysis of numerical simulations with an aperiodic stochastic signal superposed to a decaying signal shows that the detrending method used does not result in a spurious periodic signal. Furthermore, we can show that when the underlying signal is rhythmic, the correct period is obtained even when the moving average is a few hours larger or smaller than 24 h.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      We apologize for not being clear enough. The device can in principle sample at any desired resolution. Notice, however, that the variable we are analyzing (number of eggs laid by a single female) has only a few possible values, which is one of the features that render the assessment of rhythmicity a particularly difficult task. If egg laying is sampled more often (say, at 2 h intervals) more time points will be available, but the values available for each time point will be much less. We will show an example where we compare both rates (2h and 4h). Even though the 2h sampling reveals the rhythmicity of the time series, the significance of the peaks obtained is less than when sampling at 4h intervals. We have found that a 4h sampling seems to provide the best compromise between frequency of the sampling and discreteness of the variable.

      On the other hand, it is important to stress that sampling frequency and longer durations are not very correlated (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). It has been shown that the best way to make accurate predictions of the period of a rhythmic signal is to have a series spanning many cycles, irrespective of the sampling frequency. In other words, it is not true that with a 2h sampling it would be possible to analyze shorter series than with 4h sampling. Unfortunately, egg laying records are usually less than 5 cycles long, which is one of the reasons for the difficulties in the assessment of their rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      The assumption is difficult to test rigorously, since for individual flies the records seem to be so noisy that no information can be extracted. As shown in the paper, it is even very difficult to assess the presence of rhythmicity at the individual level. We consider that the appearance of a rhythm after averaging several records shows the presence of this rhythm at the individual level. But it could be argued that the presence of rhythmicity in the average record could be due to only a few (or even a single) rhythmic individuals. In order to show that this is probably not the case, in the revised version we will show that, when the individuals that are rhythmic are left out, the average of the remaining flies still shows a rhythm (albeit a weaker one, as was to be expected).

      Regarding our assumption that all flies have the “same” period, the results on Fig. 1 F cannot really rule out this possibility, because with so few cycles, the determination of the period is not very accurate (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). In our case, the error for the period is related to the width of the corresponding peak in the periodogram, which is typically 4 hs. In any case, in the revised version we will try to show, by using numerical simulations, that when the individual periods are not the same, but are distributed approximately as in Fig 1F, the average series is still rhythmic with the correct period.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      Even though we think that the individual records are in general too noisy to be really informative, we will provide all the individual egg profiles in the Supplementary Material of the revised version, in order to let the reader, check this for herself/himself.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that this may introduce some bias in the results. But in our opinion this bias is very difficult to avoid, since for females that lay very few eggs, rhythmicity can even be difficult to define (some females can spend a whole day without laying a single egg). On the other hand, even when the results may not be representative of the whole population, they would be representative of the flies that lay most of the eggs in a population, which seems to be very relevant in ecological terms.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      The question of possible rhythmic outliers has been addressed above, in question 5, where we discuss why we think that such outliers are not “determinant for the observed level of rhythmicity”. As also mentioned above, even though we think that they are too noisy to be informative, we plan to include all individual profiles in the Supplementary Material.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We assume that the features mentioned refer to the appearance in the periodograms of two small peaks under the significance lines. We are aware that in the studies of the rhythmicity of locomotor activity such features are usually interpreted as “complex rhythms”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, however, at least two other possibilities should be taken into account. Since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two small peaks could correspond to the periods of two different subpopulations. Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles (as explained above) and also few points per cycle. A cursory examination of the individual profiles, that will be provided in the new version, do not seem to support any of the first two possibilities mentioned. On the other hand, we will show evidence that the analysis of series that are perfectly random sometimes result in periodograms with some small peaks.

    1. eLife Assessment

      This important study demonstrates the ability for high-throughput recording and categorization of unconstrained and stimulus-based behaviors across a very large population of marmosets (n = 120 animals across 36 family units). The authors implement an analytical approach to identify "outlier" behavior that could be key in the development of next-generation precision psychiatry. While the strength of evidence appears solid overall, many key methodological details are incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate a fully unsupervised, high throughput (meaning very low human interaction required) approach to quantifying marmoset behavior in unconstrained environments.

      Strengths:

      The authors provide an approach that is scalable, easy to implement at face value, and highly robust. Currently, most behavioral quantification approaches do not work well on marmosets, or the published examples that do look promising do not scale towards high throughput as demonstrated by the authors.

      While marmosets can certainly be a useful translational research model devoid of free behavior quantification, the authors make a compelling point about how this approach can be useful in the study of treatments of emerging marmoset disease models.

      Overall this is a very exhaustive manuscript that overcomes significant shortcomings in previous work and speaks highly to the use of marmosets for unconstrained behavioral and neural assessment.

      Weaknesses:

      Recording marmoset behavior with a 60Hz frame rate is a significant limitation to the approach which is hopefully easily alleviated in the future through better cameras/reconstruction pipelines. Marmosets (in the reviewers' experience) have a lot of motion energy above the 30Hz nyquist limit imposed by this system and are agile to a degree requiring higher frame rates.

      The manuscript neglects recent approaches to non-human primate behavioral quantification from other groups that should be included. Simians are simians after all.

      As a minor weakness, this reviewer would have liked to see code shared for the reviewers to evaluate, especially pertaining to the high throughput and robustness of the approach.

    3. Reviewer #2 (Public review):

      In this manuscript, Menegas et al. classify the "control" behavior of captive marmosets. They combine behavioral screening from video recordings with audio and neural recordings (from the striatum) to better define what can be considered a typical behavioral repertoire for captive marmoset monkeys. A range of analyses is presented, investigating various aspects of behavior, such as social interactions and the detection of atypical individuals.

      The manuscript is compelling in many respects, especially due to the richness of the dataset and the breadth of analyses presented. However, a significant issue with the manuscript lies in its writing: the results are conveyed in an overly succinct and superficial manner, and the "Methods" section is nearly absent. Key concepts are often undefined, and the mathematical details underlying the figures are not explained, leaving readers to guess the authors' approach.

      Another issue is the vague use of the term "natural behavior." All data presented here appear to have been collected in small cages with limited climbing opportunities and enrichment. Thus, the authors should refrain from using "natural" to describe these conditions.

      Below, we elaborate further on the lack of methodological detail. Based on these issues, we believe the manuscript, in its current form, does not meet the scientific standards necessary for proper review. We strongly encourage the authors to undertake an extensive revision.

      Major Revision Points:

      The methods and results require significantly more detail. A scientific publication should provide readers with enough information to reproduce the study. Here, the detail level is far too low to fully understand, or reproduce, the study, and in many instances, readers are left to guess how the figure panels were produced. Below is a non-exhaustive list of examples illustrating these issues:

      (1) "we temporarily placed horizontal cage dividers to reduce the total cage size during data collection": What were the resulting (and initial) cage dimensions?

      (2) "After training the network, we hierarchically clustered the latent space": What is the latent space? Based on Figure 2a, it appears related to the network's recurrent layer, but this is not clarified in the text.

      (3) Alpha and perplexity parameters: Please define these terms. Since these concepts appear fundamental, readers should not have to consult external references.

      (4) "We then traced cluster identities across hierarchical levels": What are hierarchical levels?

      (5) "To understand how the input time series data was weighed in the bottleneck layer of the model": What is the bottleneck layer?

      (6) "we measured the average attention allocation to previous time points": The authors should define "attention allocation."

      (7) "we compared each neuron's firing rate distribution to shuffled data based on the overall frequency of each behavior during the session": This description is insufficient to understand the analysis.

      (8) "we hierarchically clustered neurons according to their firing rate enrichment maps": No mathematical explanation is provided for neuron clustering, nor is the concept of a "firing rate enrichment map" clarified.

      (9) "Cluster 4 showed higher activity when neurons were 'alone' or 'active'": This is vague and uses unclear jargon (e.g., "neurons alone"). Additionally, no mathematical explanation is provided for assigning neuronal activity to behavioral states.

      (10) Figure 3f, right-side panels: The analysis seems to involve cage mate positioning, yet no description is provided.

      (11) "we used motion watches to measure activity across all hours": Are these motion-sensitive watches physically attached to the animals? The methodology should be described, including data analysis details.

      This list could continue, but we trust the authors understand the point. There is a wealth of analyses and information in this study, but the descriptions are too superficial. We understand that fully describing each analysis may require significant rewriting, including supplementary figures, and will likely make the manuscript longer. This is entirely acceptable, as the ideas presented here are worth the added rigor.

      "Natural behavior": Typically, the term "natural" suggests that the dataset reflects the range of behaviors exhibited by animals in the wild. Here, however, recordings were made in a small cage with limited climbing opportunities and enrichment. Under these conditions, it's hard to justify describing the behavior as "natural". In a project aimed at classifying the behavioral repertoire of marmoset monkeys and making this dataset accessible to other laboratories, it would be helpful to include more detailed information about the animals' housing conditions. This might include cage sizes, temperature, humidity, and details on food quantities, quality, and feeding times.

      Correlation versus causation: In the section titled "Large-scale data collection reveals variability across days and correlation between cagemates," the authors conclude: "Overall, these results indicate that measurements of animals' behavioral traits depend heavily on their social environment." This interpretation seems incorrect. We know that animal behavior varies throughout the day, with activity peaks typically occurring in the morning and afternoon. Such factors, or other external influences, could induce correlations between animals that are not caused by social interactions.

      Figure 4g: What are we intended to conclude from this analysis?

      Figure 5: Please specify the type of calls analyzed. For example, did you analyze only long-distance calls (aka 'loud phees' or 'shrills')? In "We split the audio data into 5-minute (non-continuous) segments and found that the average call rate in these segments varied from 0 calls per minute to 60 calls per minute (Fig. 5d-e)," does the call rate refer to individual animals or the entire cage?

      "This implies that a high rate of calls in a room can interrupt animals during social resting states and cause them to preferentially exhibit more active/attentive states." Does it? This could simply indicate that more active animals produce more calls.

      "We recorded neural activity in the striatum because it is known to contain diverse signals related to movement and social interactions." While I understand that the authors intend to publish neural data separately, a brief discussion of the striatum's role here would be helpful.

    4. Author response:

      We would like to thank the editors and reviewers for taking the time to help improve our manuscript. We appreciate the feedback and will definitely increase the level of methodological detail in a revised submission.

      Here is a brief summary of our plan to address the points raised by the reviewers. We will respond to the comments in a point-by-point manner when we resubmit a revised manuscript.

      Reviewer 1

      This reviewer raised a question about the 60 Hz frame rate for recording. We agree that increasing the number of cameras and frame rate would improve the tracking quality, but this would come at the cost of scalability. In the current study (and other concurrent studies in the lab), we recorded from 10-20 families simultaneously to try to sample the distribution of behavioral responses to stimuli observed in animals in our colony. This was only possible logistically because of the lightweight equipment design allowing us to record data from animals without large disruptions to their home-cage environment.

      One strategy for acquiring higher-resolution data is to build a small number of enclosures that are fully surrounded by cameras, and to cycle animals through these enclosures (1). However, this strategy limits throughput by reducing the number of animals per day that can be studied. If the size and cost of cameras and computers decreases in the future, then this recording strategy will be scalable to the whole-colony level. For our current study and analysis, we are limited by the resolution of our dataset. We do believe that our data (although not a perfect 3d reconstruction or an extremely high frame rate) is sufficient to label behavioral states with high accuracy. We will add a figure to more clearly show that behavioral state data can be accurately inferred from this imperfect data, which has also been recently highlighted by other groups (2).

      Additionally, with recent progress in the application of deep learning to animal pose tracking, new models can infer 3d pose dynamics from 2d data (3) and leverage spatiotemporal structure to clean up noisy data (4). We believe that other groups will be able to use these types of approaches to extract much more value from this dataset. So, in summary, we do understand the concern related to reconstruction quality and will 1) more clearly define the usefulness of our current models, 2) release our data and code so that others can build upon it or repurpose it, and 3) plan future experiments with higher camera count and frame rate as permitted by logistical constraints. 

      Reviewer 2

      This reviewer asked for an increased level of methodological detail. We will try to address this in a few ways:

      (1) Code and data sharing. We believe that many of the questions related to the methodology will be best answered by sharing the data and code directly. Because there is a large amount of code associated with this manuscript, it is impractical to list every step and every parameter in the paper. Along with our revised manuscript, we will make our data and code publicly available. That said, we will improve our description of key parameters in the paper as the reviewer suggested.

      (2) More detailed Methods section. The reviewer asked us to provide more methodological detail. We understand that this is currently a weakness of our manuscript, and we will focus on addressing it. For instance, the reviewer rightly points out that we did not describe the motion watches used to generate the data in Figure S7. We will address this.

      (3) Simplify the manuscript. The paper currently has 22 figures, and further analysis could be done based on the results shown in any of them. For instance, this reviewer asked us to add a comparison across females and males (similar to our comparison of juveniles and adults). While we plan to add that analysis, we recognize that there are several figures/panels that are not closely related to our intended goal of describing the patterns we found in our large dataset. We will simplify the manuscript by removing some excess figures/panels and focus on describing the parts of the analysis that are crucial to our conclusions in greater detail.

      (4) More careful language. This reviewer pointed out that there were some inaccuracies with our descriptive language. For instance, we used the term "natural" behavior to describe the behavior of animals in captivity, which may more accurately be described as their home-cage behavior. We will be more careful to align our language to the standard for the field. For instance, several studies refer to unrestrained behavior in a laboratory setting as "spontaneous" behavior rather than "natural" behavior (5). In our case, the data consists of both spontaneously occurring behavior and responses to a set of stimuli. We will make sure that the descriptions are more precise in the revised manuscript.

      (1) Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat Commun 11, (2020).

      (2) Weinreb, C. et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. bioRxiv (2023) doi:10.1101/2023.03.16.532307.

      (3) Gosztolai, A. et al. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 18, 975–981 (2021).

      (4) Wu, A. et al. Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking. Adv Neural Inf Process Syst 33, 6040–6052 (2020).

      (5) Levy, D. R. et al. Mouse spontaneous behavior reflects individual variation rather than estrous state. Curr Biol 33, 1358-1364.e4 (2023).

    1. eLife Assessment

      This useful work identifies a key role for Tachykinin-1 parasubthalamic neurons in avoidance learning. At present, the evidence for the conclusions regarding fiber photometry, viral transfection, reporting of behavioral outcomes, and pathway-specificity is incomplete. This work will be of interest to neuroscientists studying neural mechanisms for avoidance and aversion.

    2. Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      (2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for statistical tests should be clearly described in each plot.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

    5. Author response:

      Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry, the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also, the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      We are very pleased that Reviewer 1 thought our data is solid.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      Thank you Reviewer 1 for providing us insightful suggestions. Based on our fiber photometry data that the activities of PSTN Tac1+ neurons show a significant increase in CS-evoked calcium fluorescent signals in late trials relative to those in early trials (Figure 1H-K) and our optogenetic inhibition experiments during CS (Figure 2N-Q), these results illustrate that the activities of PSTN Tac1+ neurons are modulated by learning and are required for active avoidance learning. Moreover, PSTN Tac1+ neurons are activated by footshock and activation of these cells is sufficient to induce avoidance behavior. These findings demonstrate that PSTN Tac1+ neurons encode aversive information. Together, our current data support that PSTN Tac1+ neurons encode both aversive event and its predicting cue. We will clarify our conclusions in the revised manuscript.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall, I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      In the revised version of manuscript, we will provide more histological and functional evidence for the PSTN-to-CeA and PSTN-to-PBN circuits to support our conclusion on the functional roles of these downstream targets. Similar with our anterograde experiment that the PSTN densely projects to CeA and PBN (Figure S6), optogenetic activation and inhibition experiments showed dense axonal terminals in the CeA and PBN from the PSTN and this line of data will be included in the revised manuscript. In addition, we will further examine these circuits by investigating the functional roles of CeA-projecting or PBN-Projecting PSTN neurons during 2-way active avoidance task.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      In the revised version of manuscript, we will provide larger example images containing pSTN and its adjacent areas to demonstrate that the viral expression is well restricted into this brain area. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      I totally agree with Reviewer 1’s concerns. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      Thank you Reviewer 1 for this suggestion. We will re-plot the data as individual measurements in the revised manuscript.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      Thank you Reviewer 1 for this insightful suggestion. During the review process, we have performed this line of experiment as in Figure S3. We measured the behavioral responses during pSTN optogenetic inhibition after the mice already learned to associate CS with US and found most GtACR-expressing mice showed unaffected avoidance learning. This data will be included in the revised manuscript.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      I agree with the Reviewer 1’s comment on the string findings in the optogenetic inhibition results. Indeed, based on the results on days 1 and 2, optogenetic inhibition of PSTN tac1+ neurons has significantly blocked GtACR-expressing animals’ behavioral performance during 2-way active avoidance task. To examine whether the effect by optogenetic inhibition of these neurons could possibly decline with prolonged training, we conducted additional 5-day training. We will discuss and add this comment in the revised manuscript.

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      We will add histological images and clarify these comments in the revised manuscript. The purpose of this experiment is to illustrate that even slightly spreading ChR2 viruses into Tac1+ neurons of the adjacent areas of the PSTN did not result in behavioral changes and this will indirectly support the main behavioral function caused by the PSTN tac1+ neurons rather than its neighboring areas. Because Tac1+ neurons outside the PSTN are sparsely expressed, it is quite difficult to completely restrict the viral expression in the PSTN from the anterior to the posterior. Thus, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

      We will follow Reviewer 1’s suggestion to include raw data (in seconds of time spent) in each compartment for each group across baseline and test days in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      We highly appreciate that Reviewer 2 thought that our experiments presented were well-designed to support the conclusions and provided valuable information in several aspects.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      Similar with questions 3 and 8 of Reviewer 1. We will provide the viral expression and fiber implant location data for all animals included in the figures and histological images in Figure S5 in the revised manuscript. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.  

      2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      We will follow Reviewer 2’s suggestion and perform isosbestic-correction for fluorescent signals prior to calculating ΔF/F. We will re-plot related figures and add this information in the revised manuscript.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      We will follow Reviewer 2’s suggestion to perform a control experiment showing intact locomotor ability in caspase 3-ablated mice and will include this data in the revised manuscript.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      Thank you Reviewer 2 for this useful suggestion. We will examine the valence with PTSN silencing manipulations by using a RTPP test and add this data in the revised manuscript.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

      Thank you Reviewer 2 for this useful suggestion. During the review process, we have performed ablation and inhibition experiments in females, demonstrating similar behavioral effects as those in males. We will add these data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      We are very pleased to have Reviewer 3’s positive comments on the manuscript.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      Thank you Reviewer 3 for this useful suggestion. We will perform in vitro slice recording experiments to verify optogenetic manipulations and add this line of evidence in the revised manuscript.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      Similar with question 4 of Reviewer 1. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp-ablated mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for should be clearly described in each plot.

      We have provided all statistical information in the Supplementary Table 1. In the revised manuscript, we will perform across-animal tests, re-plot new figures and provide clear statistical information.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      Following Reviewer 3’s suggestion, we will perform across-animal tests. In the first version of our manuscript, for fiber photometry experiments, we pooled trial data of each animal and performed statistics tests across trials. Because avoidance and failure trials were different, we thus selected an unpaired test for this kind of dataset.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      Similar with question 4 of Reviewer 3, we pooled trial data of each animal and performed statistics tests across trials. We will perform across-animal tests and re-plot figures by connecting with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials for each animal.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      Thank you Reviewer 3 for this useful suggestion. We will follow this suggestion and add this analysis in the revised manuscript.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      Similar with question 9 of Reviewer 1, we will show the original number before normalization in the revised manuscript.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      We will follow this suggestion and provide histological images with lower magnification in the revised manuscript.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      We will quantify the effects of different downstream targets of the PSTN to make a precise conclusion.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

      As mentioned above, we will perform across-animal tests and provide clear statistical information in the figure legends and supplementary table 1.

    1. eLife Assessment

      This study uses a large dataset from both recent isolates and genomes in databases to provide an important analysis of the population structure of the pathogen Salmonella gallinarum. The authors present convincing results regarding the regional adaptation and the evolutionary trajectory of the resistome and mobilome, even though some issues regarding the genomic analysis could be improved. This work will interest microbiologists and researchers working on genomics, evolution, and antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      The investigators in this study analyzed the dataset assembly from 540 Salmonella isolates, and those from 45 recent isolates from Zhejiang University of China. The analysis and comparison of the resistome and mobilome of these isolates identified a significantly higher rate of cross-region dissemination compared to localized propagation. This study highlights the key role of the resistome in driving the transition and evolutionary history of S. Gallinarum.

      Strengths:

      The isolates included in this study were from 16 countries in the past century (1920 to 2023). While the study uses S. Gallinarun as the prototype, the conclusion from this work will likely apply to other Salmonella serotypes and other pathogens.

      Weaknesses:

      While the isolates came from 16 countries, most strains in this study were originally from China.

      Comments on revisions:

      This reviewer is happy with the detailed responses from the authors regarding revising this manuscript. I do not have further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sequence 45 new samples of S. Gallinarum, a commensal Salmonella found in chickens, which can sometimes cause disease. They combine these sequences with around 500 from public databases, determine the population structure of the pathogen, and coarse relationships of lineages with geography. The authors further investigate known anti-microbial genes found in these genomes, how they associate with each other, whether they have been horizontally transferred, and date the emergence of clades.

      Strengths:

      - It doesn't seem that much is known about this serovar, so publicly available new sequences from a high burden region are a valuable addition to the literature.<br /> - Combining these sequences with publicly available sequences is a good way to better contextualise any findings.<br /> - The genomic analyses have been greatly improved since the first version of the manuscript, and appropriately analyse the population and date emergence of clades.<br /> - The SNP thresholds are contextualised in terms of evolutionary time.<br /> - The importance and context of the findings are fairly well described.

      Weaknesses:

      - There are still a few issues with the genomic analyses, although they no longer undermine the main conclusions:

      (1) Although the SNP distance is now considered in terms of time, the 5 SNP distance presented still represents ~7yrs evolution, so it is unlikely to be a transmission event, as described. It would be better to use a much lower threshold or describe the interpretation of these clusters more clearly. Bringing in epidemiological evidence or external references on the likely time interval between transmissions would be helpful.

      (2) The HGT definition has not fundamentally been changed and therefore still has some issues, mainly that vertical evolution is still not systematically controlled for. Using a 5kb window is not sufficient, as LD may extend across the entire genome. As the authors have now run gubbins correctly, they could use the results from this existing analysis to find recent HGT. To definite mobilisation, perhaps a standard pipeline such (e.g. https://github.com/EBI-Metagenomics/mobilome-annotation-pipeline) would be more convincing.

      (3) The invasiveness index is better described, but the authors still did not provide convincing evidence that the small difference is actually biologically meaningful (there was no statistical difference between the two strains provided in response Figure 6). What do other Salmonella papers using this approach find, and can their links be brought in? If there is still no good evidence, a better description of this difference would help make the conclusions better supported.

      In summary, the analysis is broadly well described and feels appropriate. Some of the conclusions are still not fully supported, although the main points and context of the paper now appear sound.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The investigators in this study analyzed the dataset assembly from 540 Salmonella isolates, and those from 45 recent isolates from Zhejiang University of China. The analysis and comparison of the resistome and mobilome of these isolates identified a significantly higher rate of cross-region dissemination compared to localized propagation. This study highlights the key role of the resistome in driving the transition and evolutionary 

      Thank you for summarizing our work. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Additionally, to fully contextualize the background knowledge and clarify the major points in this study, we add some references.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion and keep the uniform knowledge in the typing system, we have adjusted the lineage nomenclature along the revised manuscript to reflect the corrected order as follows:

      Author response table 1.

      To ensure consistency with previous studies, we have revised the nomenclature for the different lineages of bvSP.

      Strengths: 

      The isolates included in this study were from 16 countries in the past century (1920 to 2023). While the study uses S. Gallinarun as the prototype, the conclusion from this work will likely apply to other Salmonella serotypes and other pathogens. 

      Thanks for the constructive comments and the positive reception of the manuscript.

      Weaknesses: 

      While the isolates came from 16 countries, most strains in this study were originally from China. 

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries, with a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      Author response image 1.

      Geographic distribution of 580 S. Gallinarum. Different colors indicate the countries of origin for the 580 S. Gallinarum strains in the dataset. Darker shades represent higher numbers of strains.

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation of the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains challenging.  

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. There are more frequent reports of fowl typhoid in some high chicken-producing developing countries. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).

      Author response image 2.

      The United States Department of Agriculture (USDA) data on annual chicken meat production for 2023/2024 across different countries globally.

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms.

      (4) As China is the primary country of origin for the strains in this study, it is necessary to ensure that the strains from China are consistent with the local geographic characteristics of the country. Therefore, we conducted a correlation analysis between the number of strains from different provinces in China and the total GDP/population size of those provinces (Author response image 3). The results show that most points fall within the 95% confidence interval of the regression line. Although some points exhibit relative unbalance in the number of S. Gallinarum strains, most data points for these regions have a small sample size (n < 15). Overall, we found that the prevalence of S. Gallinarum in different regions of China is consistent with the overall nationwide trend.

      Author response image 3.

      Correlation analysis between the number of S. Gallinarum collected from different provinces in China and the total GDP/population size. The figure depicts a series of points representing individual provinces. The x-axis indicates the number of S. Gallinarum included in the dataset, while the y-axis displays the values for total GDP and total population size, respectively.

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we have further emphasized the limitations as follow:

      Lines 427-429: “However, the current study has some limitations. Firstly, despite assembling the most comprehensive WGS database for S. Gallinarum from public and laboratory sources, there are still biases in the examined collection. The majority (438/580) of S. Gallinarum samples were collected from China, possibly since the WGS is a technology that only became widely available in the 21st century. This makes it impractical to sequence it on a large scale in the 20th century, when S. Gallinarum caused a global pandemic. So, we suspect that human intervention in the development of this epidemic is the main driving force behind the fact that most of the strains in the data set originated in China. In our future work, we aim to actively gather more data to minimize potential biases within our dataset, thereby improving the robustness and generalizability of our findings.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors sequence 45 new samples of S. Gallinarum, a commensal Salmonella found in chickens, which can sometimes cause disease. They combine these sequences with around 500 from public databases, determine the population structure of the pathogen, and coarse relationships of lineages with geography. The authors further investigate known anti-microbial genes found in these genomes, how they associate with each other, whether they have been horizontally transferred, and date the emergence of clades. 

      Thank you for your constructive suggestions, which are valuable and highly beneficial for improving our paper. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Furthermore, to fully contextualize the background knowledge and clarify the major points in this study, we add some references to support our findings and policy implications.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      Strengths: 

      (1) It doesn't seem that much is known about this serovar, so publicly available new sequences from a high-burden region are a valuable addition to the literature. 

      (2) Combining these sequences with publicly available sequences is a good way to better contextualise any findings. 

      Thank you so much for your thorough review and constructive comments on the manuscript.

      Weaknesses: 

      There are many issues with the genomic analysis that undermine the conclusions, the major ones I identified being: 

      (1) Recombination removal using gubbins was not presented fully anywhere. In this diversity of species, it is usually impossible to remove recombination in this way. A phylogeny with genetic scale and the gubbins results is needed. Critically, results on timing the emergence (fig2) depend on this, and cannot be trusted given the data presented. 

      We sincerely thank you for pointing out this issue. In the original manuscript, we aimed to present different lineages of S. Gallinarum within a single phylogenetic tree constructed using BEAST. However, in the revised manuscript, we have addressed this issue by applying the approach recommended by Gubbins to remove recombination events for each lineage defined by FastBAPs. Additionally, to better illustrate the removal of recombination regions in the genome, we have included a figure generated by Gubbins (New Supplementary Figure 12). 

      Our results indicate that recombination events are relatively infrequent in Lineage 1, followed by Lineage 3, but occur more frequently in Lineage 2. In the revised manuscript, we have included additional descriptions in the Methods section to clarify this analysis. We hope these modifications adequately address the reviewer’s concerns and enhance the trustworthiness of our findings.

      (2) The use of BEAST was also only briefly presented, but is the basis of a major conclusion of the paper. Plot S3 (root-to-tip regression) is unconvincing as a basis of this data fitting a molecular clock model. We would need more information on this analysis, including convergence and credible intervals. 

      Thank you very much for raising this issue. We decided to reconduct separate BEAST analyses for each lineage, accurately presenting the evolutionary scale based on the abovementioned improvements. The implementation of individual lineage for BEAST analysis was conducted based on the following steps:

      (1) Using R51 as the reference, a reference-mapped multiple core-genome SNP sequence alignment was created, and recombination regions were detected and removed as described above.

      (2) TreeTime was used to assess the temporal structure by performing a regression analysis of the root-to-tip branch distances within the maximum likelihood tree, considering the sampling date as a variable (New Supplementary Figures 6). However, the root-to-tip regression analysis presented in New Supplementary Figures 6 was not intended as a basis for selecting the best molecular clock model; its purpose was to clean the dataset with appropriate measurements.

      (3) To determine the optimal model for running BEAST, we tested a total of six combinations in the initial phase of our study. These combinations included the strict clock, relaxed lognormal clock, and three population models (Bayesian SkyGrid, Bayesian Skyline, and Constant Size). Before conducting the complete BEAST analysis, we evaluated each combination using a Markov Chain Monte Carlo (MCMC) analysis with a total chain length of 100 million and sampling every 10,000 iterations. We then summarized the results using NSLogAnalyser and determined the optimal model based on the marginal likelihood value for each combination. The results indicated that the model incorporating the Bayesian Skyline and the relaxed lognormal clock yielded the highest marginal likelihood value in our sample. Then, we proceeded to perform a timecalibrated Bayesian phylogenetic inference analysis for each lineage. The following settings were configured: the "GTR" substitution model, “4 gamma categories”, the "Relaxed Clock Log Normal" model, the "Coalescent Bayesian Skyline" tree prior, and an MCMC chain length of 100 million, with sampling every 10,000 iterations.

      (4) Convergence was assessed using Tracer, with all parameter effective sampling sizes (ESS) exceeding 200. Maximum clade credibility trees were generated using TreeAnnotator. Finally, key divergence time points (with 95% credible intervals) were estimated, and the tree was visualized using FigTree. 

      For the key lineages, L2b and L3b (carrying the resistome, posing antimicrobial resistance (AMR) risks, and exhibiting intercontinental transmission events), we have redrawn Figure 2 based on the updated BEAST analysis results (New Figure 2). For L1, L2a, and L3c, we have added supplementary figures to provide a more detailed visualization of their respective BEAST analysis outcomes (New Supplementary Figures 3-5). The revised BEAST analysis indicates that the origin of L3b in China can be traced back to as early as 1683 (95% CI: 1608 to 1839). In contrast, the earliest possible origin of L2b in China dates back to 1880 (95% CI: 1838 to 1902). This indicates that the previous manuscript's assumption that L2b is an older lineage compared to L3b may be inaccurate. 

      Furthermore, In the revised manuscript, we specifically estimated the time points for the first intercontinental transmission events for the two major lineages, L2b and L3b. Our results indicate that L2b, likely underwent two major intercontinental transmission events. The first occurred around 1893 (95% CI: 1870 to 1918), with transmission from China to South America. The second major transmission event occurred in 1923 (95% CI: 1907 to 1940), involving the spread from South America to Europe. In contrast, the transmission pattern of L3b appears relatively more straightforward. Our findings show that L3b, an S. Gallinarum lineage originating in China, only underwent one intercontinental transmission event from China to Europe, likely occurring around 1790 (95% CI: 1661 to 1890) (New Supplementary Figure 7). Based on the more critical BEAST analysis for each lineage, we have revised the corresponding conclusions in the manuscript. We believe that the updated BEAST analysis, performed using a more accurate recombination removal approach, significantly enhances the rigor and credibility of our findings.

      (3) Using a distance of 100 SNPs for a transmission is completely arbitrary. This would at least need to be justified in terms of the evolutionary rate and serial interval. 

      Using single nucleotide polymorphism (SNP) distance to trace pathogen transmission is a common approach (J Infect Dis. 2015 Apr 1;211(7):1154-63) and in our previous studies (hLife 2024; 2(5):246-256. mLife 2024; 3(1):156-160.). When the SNP distance within a cluster falls below a set threshold, the strains in that cluster are considered to have a potential direct transmission link. It is generally accepted that the lower the threshold, the more stringent the screening process becomes. However, there is little agreement in the literature regarding what such a threshold should be, and the appropriate SNP cut-off for inferring transmission likely depends critically on the context (Mol Biol Evol. 2019 Mar 1;36(3):587-603).

      In this study, we compared various thresholds (SNPs = 5, 10, 20, 25, 30, 35, 40, 50, 100) to ensure clustering in an appropriate manner. First, we summarized the tracing results under each threshold (Author response image 4), which demonstrated that, regardless of the threshold used, all strains associated with transmission events originated from the same location (New Figure 3a).

      Author response image 4.

      Clustering results of 45 newly isolated S. Gallinarum strains using different SNP thresholds of 5, 10, 15, 20, 25, 28, 30, 50, and 100 SNPs. The nine subplots represent the clustering results under each threshold. Each point corresponds to an individual strain, and lines connect strains with potential transmission relationships.

      In response to your comments regarding the evolutionary rate, we estimated the overall evolutionary rate of the S. Gallinarum using BEAST. We applied the methodology described by Arthur W. Pightling et al. (Front Microbiol. 2022 Jun 16; 13:797997). The numbers of SNPs per year were determined by multiplying the evolutionary rates estimated with BEAST by the number of core SNP sites identified in the alignments. We hypothesize that a slower evolutionary rate in bacteria typically requires a lower SNP threshold when tracing transmission events using SNP distance analysis. Pightling et al.'s previous research found an average evolutionary rate of 1.97 SNPs per year (95% HPD, 0.48 to 4.61) across 22 different Salmonella serotypes. Our updated BEAST estimation for the evolutionary rate of S. Gallinarum suggests it is approximately 0.74 SNPs per year (95% HPD, 0.42 to 1.06). Based on these findings, and our previous experience with similar studies (mBio. 2023 Oct 31;14(5):e0133323.), we set a threshold of 5 SNPs in the revised manuscript.

      Then, we adopted the newly established SNP distance threshold (n=5) to update Figure 3a and New Supplementary Figure 8. The heatmap on the far right of New Figure 3a illustrates the SNP distances among 45 newly isolated S. Gallinarum strains from two locations in Zhejiang Province (Taishun and Yueqing). New Supplementary Figure 8 simulates potential transmission events between the bvSP strains isolated from Zhejiang Province (n=95) and those from China with available provincial information (n=435). These analyses collectively demonstrate the localized transmission pattern of bvSP within China. Our analysis using the newly established SNP threshold indicates that the 45 strains isolated from Taishun and Yueqing exhibit a highly localized transmission pattern, with pairs of strains exhibiting potential transmission events below the set threshold occurring exclusively within a single location. Subsequently, we conducted the SNP distance-based tracing analysis for the 95 strains from Zhejiang Province and those from China with available provincial information (n=435) (New Supplementary Figure 8, New Supplementary Table S8). Under the SNP distance threshold (n=5), we identified a total of 91 potential transmission events, all of which occurred exclusively within Zhejiang Province. No inter-provincial transmission events were detected. Based on these findings, we revised the methods and conclusions in the manuscript accordingly. We believe that the updated version well addresses your concerns.

      Nevertheless, the final revised and updated results do not change the conclusions presented in our original manuscript. Instead, applying a more stringent SNP distance threshold allows us to provide solid evidence supporting the localized transmission pattern of S. Gallinarum in China. 

      (4) The HGT definition is non-standard, and phylogeny (vertical inheritance) is not controlled for.  

      The cited method: 

      'In this study, potentially recently transferred ARGs were defined as those with perfect identity (more than 99% nucleotide identity and 100% coverage) in distinct plasmids in distinct host bacteria using BLASTn (E-value {less than or equal to}10−5)' 

      This clearly does not apply here, as the application of distinct hosts and plasmids cannot be used. Subsequent analysis using this method is likely invalid, and some of it (e.g. Figure 6c) is statistically very poor. 

      Thank you for raising this important question. In our study, Horizontal Gene Transfer (HGT) is defined as the transfer of genetic information between different organisms, a process that facilitates the spread of antibiotic resistance genes (ARGs) among bacteria. This definition of HGT is consistent with that used in previous studies (Evol Med Public Health. 2015; 2015(1):193–194; ISME J. 2024 Jan 8;18(1):wrad032). In Salmonella, the transfer of antimicrobial resistance genes via HGT is not solely dependent on plasmids; other mobile genetic elements (MGEs), such as transposons, integrons, and prophages, also play significant roles. This has also  been documented in our previous work (mSystems. 2023 Dec 21;8(6):e0088323). Given the involvement of various MGEs in the horizontal transfer of ARGs, we propose that the criteria for evaluating horizontal transfer via plasmids can also be applied to ARGs mediated by other MGEs.

      In this study, we adopted stricter criteria than those used by Xiaolong Wang et al. Specifically, we defined two ARGs as identical only if they exhibited 100% nucleotide identity and 100% coverage. To address concerns regarding the potential influence of vertical inheritance in our analysis, we have made the following improvements. In the revised manuscript, we provide a more detailed table that includes the co-localization analysis of each ARG with mobile genetic elements (New Supplementary Table 9). For prophages and plasmids, we required that ARGs be located directly within these elements. In contrast, for transposons and integrons, we considered ARGs to be associated if they were located within a 5 kb region upstream or downstream of these elements (Nucleic Acids Res. 2022 Jul 5;50(W1):W768-W773). 

      In the revised manuscript, we first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China according to the aforementioned criteria and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, we recalculated the overall HGT frequency of 10 types of ARGs in China, the horizontal ARGs transfer frequency in three key regions, and the horizontal ARGs transfer frequency within a single region (New Supplementary Table 7). Based on the results, we updated relevant sections of the manuscript and remade Figure 6. The updated manuscript describes the results of this section as follows:

      “Horizontal transfer of resistome occurs widely in localized bvSP

      Horizontal transfer of the resistome facilitates the acquisition of AMR among bacteria, which may record the distinct acquisition event in the bacterial genome. To compare these events in a geographic manner, we further investigated the HGT frequency of each ARG carried by bvSP isolated from China and explored the HGT frequency of resistome between three defined regions. Potentially horizontally transferred ARGs were defined as those with perfect identity (100% identity and 100% coverage) and were located on MGEs across different strains (Fig. 6a). We first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, our findings reveal that horizontal gene transfer of ARGs is widespread among Chinese bvSP isolates, with an overall transfer rate of 92%. Specifically, 50% of the ARGs exhibited an HGT frequency of 100%, indicating that these ARGs might underwent extensive frequent horizontal transfer events (Fig. 6b). It is noteworthy that certain resistance genes, such as tet(A), aph(3'')-Ib, and aph(6)-Id, appear to be less susceptible to horizontal transfer.

      However, different regions generally exhibited a considerable difference in resistome HGT frequency. Overall, bvSP from the southern areas in China showed the highest HGT frequency (HGT frequency=95%). The HGT frequencies for bvSP within the eastern and northern regions of China are lower, at 92% and 91%, respectively (Fig. 6c). For specifical ARG type, we found tet(A) is more prone to horizontal transfer in the southern region, and this proportion was considerably lower in the eastern region. Interestingly, certain ARGs such as aph(6)-Id, undergo horizontal transfer only within the eastern and northern regions of China (Fig. 6d). Notably, as a localized transmission pathogen, resistome carried by bvSP exhibited a dynamic potential among inter-regional and local demographic transmission, especially from northern region to southern region (HGT frequency=93%) (Fig. 6e, Supplementary Table 7).”

      We also modified the current version of the pipeline used to calculat the HGT frequency of resistance genes. In the revised pipeline, users are required to provide a file specifying the locations of mobilome on the genome before formally calculating the HGT frequency of the target ARGs. The specific code and data used in the calculation have been uploaded to https://github.com/tjiaa/Cal_HGT_Frequency.

      However, we also acknowledge that the current in silico method has some limitations. This approach heavily relies heavily on prior information in existing resistome/mobilome databases. Additionally, the characteristics of second-generation sequencing data make it challenging to locate gene positions precisely. Using complete genome assemblies might be a crucial approach to address this issue effectively. In the revised manuscript, we have also provided a more detailed explanation of the implications of the current pipeline.

      Regarding your second concern, "some of it (e.g., Figure 6c) is statistically very poor," the horizontal ARG transfer frequency calculation for each region was based on the proportion of horizontal transfer events of ARGs in that region to the total possible transfer events. As a result, we are unable to calculate the statistical significance between the two regions. Our aim with this approach is to provide a rough estimate of the extent of horizontal ARG transfer within the S. Gallinarum population in each region. In future studies, we will refine our conclusions by developing a broader range of evaluation methods to ensure more comprehensive assessment and validation.

      (5) Associations between lineages, resistome, mobilome, etc do not control for the effect of genetic background/phylogeny. So e.g. the claim 'the resistome also demonstrated a lineage-preferential distribution' is not well-supported. 

      Thank you for your comments. We acknowledge that the associations between lineages and the mobilome/resistome may be influenced by the genetic background or phylogeny of the strains. For instance, our conclusion regarding the lineage-preferential distribution of the resistome was primarily based on New Figure 4a, where L3 is clearly shown to carry the most ARGs. Furthermore, we observed that L3b tends to harbor bla<sub>_TEM-1B</sub>, _sul2, and tet(A) more frequently than other lineages. However, we recognize that this evidence is insufficient to support a definitive conclusion of “demonstrated a lineage-preferential distribution”. Therefore, we have re-examined the current manuscript and described these findings as a potential association between the mobilome/resistome and lineages.

      (6) The invasiveness index is not well described, and the difference in means is not biologically convincing as although it appears significant, it is very small. 

      Thank you for pointing this out. For the invasiveness index mentioned in the manuscript, we used the method described in previous studies. (PLoS Genet. 2018 May 8;14(5), Nat Microbiol. 2021 Mar;6(3):327-338). Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed samples using the 196 top predictor genes, employing a machine-learning approach that utilizes a random forest classifier and delta-bitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at https://github.com/Gardner-BinfLab/invasive_salmonella. In the revised manuscript, we added a more detailed description of the invasiveness index calculation in the Methods section as follows:

      Lines 592-603: “Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed each sample using the 196 top predictor genes for measuring the invasiveness of S. Gallinarum, employing a machine-learning approach that utilizes a random forest classifier and deltabitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at: https://github.com/Gardner-BinfLab/invasive_salmonella.”

      Regarding the second question, 'the difference in means is not biologically convincing as although it appears significant, it is very small,' we believe that this difference is biologically meaningful. In our previous work, we infected chicken embryos with different lineages of S. Gallinarum (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). The virulence of thirteen strains of Salmonella Gallinarum, comprising five from lineage L2b and eight from lineage L3b, was evaluated in 16-day-old SPF chicken embryos through inoculation into the allantoic cavity. Controls included embryos that inoculated with phosphate-buffered saline (PBS). The embryos were incubated in a thermostatic incubator maintained at 37.5°C with a relative humidity ranging from 50% to 60%. Prior to inoculation, the viability of the embryos was assessed by examining the integrity of their venous system and their movements; any dead embryos were excluded from the study. Overnight cultures resuspended in PBS at a concentration of 1000 CFU per 100 μL were administered to the embryos. Mortality was recorded daily for a period of five days, concluding upon the hatching of the chicks. 

      It is generally accepted that strains with higher invasive capabilities are more likely to cause chicken embryo mortality. Our experimental results showed that the L2b, which exhibits higher invasiveness, with a slightly higher to cause chicken embryo death (Author response image 5). 

      Author response image 5.

      The survival curves of chicken embryos infected with bvSP isolates from S. Gallinarum L2b and S. Gallinarum L3b. Inoculation with Phosphate Buffer Saline (PBS) were considered controls. 

      (7) 'In more detail, both the resistome and mobilome exhibited a steady decline until the 1980s, followed by a consistent increase from the 1980s to the 2010s. However, after the 2010s, a subsequent decrease was identified.' 

      Where is the data/plot to support this? Is it a significant change? Is this due to sampling or phylogenetics? 

      Thank you for highlighting these critical points. The description in this statement is based on New Supplementary Figure 11. On the right side of New Supplementary Figure 11, we presented the average number of Antimicrobial Resistance Genes (ARGs) and Mobile Genetic Elements (MGEs) carried by S. Gallinarum isolates from different years, and we described the overall trend across these years. However, we realized that this statement might overinterpret the data. Given that this sentence does not impact our emphasis on the overall increasing trends observed in the resistome and mobilome, as well as their potential association, we decided to remove it in the revised manuscript.

      The revised paragraph would read as follows:

      Lines 261-268: “Variations in regional antimicrobial use may result in uneven pressure for selecting AMR. The mobilome is considered the primary reservoir for spreading resistome, and a consistent trend between the resistome and the mobilome has been observed across different lineages, from L1-L3c. We observed an overall gradual rise in the resistome quantity carried by bvSP across various lineages, correlating with the total mobilome content (S11 Fig). Furthermore, we investigated the interplay between particular mobile elements and resistome types in bvSP.”

      (8) It is not clear what the burden of disease this pathogen causes in the population, or how significant it is to agricultural policy. The article claims to 'provide valuable insights for targeted policy interventions.', but no such interventions are described. 

      Thank you for your constructive suggestions. Salmonella Gallinarum is an avian-specific pathogen that induces fowl typhoid, a severe systemic disease characterized by high mortality rates in chickens, thereby posing a significant threat to the poultry industry, particularly in developing countries (Rev Sci Tech. 2000 Aug;19(2):40524). In our previous research, we conducted a comprehensive meta-analysis of 201 publications encompassing over 900 million samples to investigate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). Our findings estimated that the global prevalence of S. Gallinarum is 8.54% (with a 95% confidence interval of 8.43% to 8.65%), with notable regional variations in incidence rates.

      Our previously analysis focused on the prevalence of S. Gallinarum (including biovars SP and SG) across six continents. The results revealed that all continents, except Oceania, exhibited positive prevalences of S. Gallinarum. Asia had the highest prevalence at 17.31%, closely followed by Europe at 16.03%. In Asia, the prevalence of biovar SP was higher than that of biovar SG, whereas in Europe, biovar SG was observed to be approximately two hundred times more prevalent than biovar SP. In South America, the prevalence of S. Gallinarum was higher than that of biovar SP, at 10.06% and 13.20% respectively. Conversely, the prevalence of S. Gallinarum was relatively lower in North America (4.45%) compared to Africa (1.10%) (Author response image 6).

      Given the significant economic losses caused by S. Gallinarum to the poultry industry and the potential risk of escalating antimicrobial resistance, more targeted policy interventions are urgently needed. Further elaboration on this implication is provided in the revised “Discussion” section as follows:

      Lines 401-416: “In summary, the findings of this study highlight that S. Gallinarum remains a significant concern in developing countries, particularly in China. Compared to other regions, S. Gallinarum in China poses a notably higher risk of AMR, necessitating the development of additional therapies, i.e. vaccine, probiotics, bacteriophage therapy in response to the government's policy aimed at reducing antimicrobial use ( J Infect Dev Ctries. 2014 Feb 13;8(2):129-36). Furthermore, given the dynamic nature of S. Gallinarum risks across different regions, it is crucial to prioritize continuous monitoring in key areas, particularly in China's southern regions where the extensive poultry farming is located. Lastly, from a One-Health perspective, controlling AMR in S. Gallinarum should not solely focus on local farming environments, with improved overall welfare on poultry and farming style. The breeding pyramid of industrialized poultry production should be targeted on the top, with enhanced and accurate detection techniques (mSphere. 2024 Jul 30;9(7):e0036224). More importantly, comprehensive efforts should be made to reduce antimicrobial usage overall and mitigate potential AMR transmission from environmental sources or other hosts (Vaccines (Basel). 2024 Sep 18;12(9):1067; Vaccines (Basel). 2023 Apr 18;11(4):865; Front Immunol. 2022 Aug 11:13:973224).”

      Author response image 6.

      A comparison of the global prevalence of S. gallinarum across continents.

      (9) The abstract mentions stepwise evolution as a main aim, but no results refer to this. 

      Thank you for raising this issue. In the revised manuscript, we have changed “stepwise evolution” to simply “evolution” to ensure a more accurate and precise description.

      (10) The authors attribute changes in population dynamics to normalisation in China-EU relations and hen fever. However, even if the date is correct, this is not a strongly supported causal claim, as many other reasons are also possible (for example other industrial processes which may have changed during this period). 

      Thank you for raising this critical issue. In the revised manuscript, we conducted a more stringent BEAST analysis for each lineage, as described earlier. This led to some changes in the inferred evolutionary timelines. Consequently, we have removed the corresponding statement from the “Results” section. Instead, we now only provide a discussion of historical events, supported by literature, that could have facilitated the intercontinental spread of L2b and L3b in the “Discussion” section. We believe these revisions have made the manuscript more rigorous and precise.

      Lines 332-342: “_The biovar types of _S. Gallinarum have been well-defined as bvSP, bvSG, and bvSD historically ( J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):2148). Among these, bvSP can be further subdivided into five lineages (L1, L2a, L2b, L3b, and L3c) using hierarchical Bayesian analysis. Different sublineages exhibited preferential geographic distribution, with L2b and L3b of bvSP being predominant global lineage types with a high risk of AMR. The historical geographical transmission was verified using a spatiotemporal Bayesian framework. The result shows that L3b was initially spread from China to Europe in the 18<sup>th</sup>-19<sup>th</sup> century, which may be associated with the European hen fever event in the mid-19th century (Burnham GP. 1855. The history of the hen fever: a humorous record). L2b, on the other hand, appears to have spread to Europe via South America, potentially contributing to the prevalence of bvSP in the United States.”  

      (11) No acknowledgment of potential undersampling outside of China is made, for example, 'Notably, all bvSP isolates from Asia were exclusively found in China, which can be manually divided into three distinct regions (southern, eastern, and northern).'.

      Perhaps we just haven't looked in other places?

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries with, a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains a challenging endeavour. 

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. In some high chicken-producing developing countries, such as China and Brazil, there are more frequent reports of fowl typhoid. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).  

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms. 

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we modified this sentence to indicate that this phenomenon is only observed in the current dataset, thereby avoiding an overly absolute statement:

      Lines 131-135: “For the bvSP strains from Asia included in our dataset, we found that all originated from China. To further investigate the distribution of bvSP across different regions in China, we categorized them into three distinct regions: southern, eastern, and northern (Supplementary Table 3)”.

      (12) Many of the conclusions are highly speculative and not supported by the data. 

      Thank you for your comment. We have carefully revised the manuscript to address your concerns. We hope that the changes made in the revised version meet your expectations and provide a clearer and more accurate interpretation of our findings.

      (13) The figures are not always the best presentation of the data: 

      a. Stacked bar plots in Figure 1 are hard to interpret, the total numbers need to be shown.

      Panel C conveys little information. 

      b. Figure 4B: stacked bars are hard to read and do not show totals. 

      c. Figure 5 has no obvious interpretation or significance. 

      Thank you for your comments. We have revised the figures to improve the clarity and presentation of the data.

      In summary, the quality of analysis is poor and likely flawed (although there is not always enough information on methods present to confidently assess this or provide recommendations for how it might be improved). So, the stated conclusions are not supported. 

      Thank you for your valuable feedback. We have carefully revised the manuscript to address your concerns. We hope that the updated figures and tables, and new data in the revised version meet your expectations and provide more appropriate interpretation of our findings.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      This reviewer enjoyed reading this well-written manuscript. The authors are encouraged to address the following comments and revise the manuscript accordingly. 

      (1) Title: The authors use avian-restrict Salmonella to refer to Salmonella Gallinarum. Please consider using Salmonella Gallinarum in the title. Also, your analysis relates to resistome and mobilome. Would it make sense to add mobilome in the manuscript? 

      Thank you for your guidance. In the revised manuscript, we have changed the title to “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction”. We believe that this revised title more accurately reflects the content of our study.

      (2) Abstract: This study uses 45 isolates from your labs. However, you failed to include these 45 isolates in the Abstract. Also, please clarify the sources of these isolates (from dead chickens, or dead chicken embryos? You wrote in two different ways in this manuscript). Also, I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work. 

      Thank you for your thorough review and constructive comments on the manuscript. In the revised version, we have added a description of 45 newly isolated S. Gallinarum strains in the Abstract to provide readers with a clearer understanding of the dataset used in this study.

      Lines 36-41: “Using the most comprehensive whole-genome sequencing dataset of Salmonella enterica serovar Gallinarum (S. Gallinarum) collected from 16 countries, including 45 newly recovered samples from two related local regions, we established the relationship among avian-specific pathogen genetic profiles and localization patterns.”

      Furthermore, the newly isolated S. Gallinarum strains were obtained from dead chicken embryos. We think your second concern may arise from the following description in the manuscript: “All 734 samples of dead chicken embryos were collected from Taishun and Yueqing in Zhejiang Province, China. After the thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.” In fact, all the collected dead chicken embryos were aged 19 to 20 days. At this developmental stage, collecting the liver, intestines, and spleen for isolation and cultivation of S. Gallinarum is possible. To avoid any confusion, we have included a more detailed description of the dead chicken embryos in the revised manuscript as follows:

      Lines 447-451: “All 734 samples of dead chicken embryos aged 19 to 20 days were collected from Taishun and Yueqing in Zhejiang Province, China. After a thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.”

      Regarding your concern about the statement, “I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work,” we would like to clarify the significance of these new isolates. Our research first identified distinct characteristics in the 45 newly isolated S. Gallinarum strains from Taishun and Yueqing, Zhejiang Province. Specifically, we found that most of the strains from Yueqing belonged to sequence type ST92, whereas the majority from Taishun were ST3717. Additionally, there were significant differences between these geographically close strains in terms of SNP distance and predicted invasion capabilities. These findings suggest that S. Gallinarum may exhibit localized transmission patterns, which forms the basis of the scientific question and hypothesis we originally aimed to address. Furthermore, in our previous work, we collected 325 S. Gallinarum strains. By incorporating the newly isolated 45 strains, we aim to provide a more comprehensive view of the population diversity, transmission pattern and potential risk of S. Gallinarum. We will continue to endeavour to understand the global genomic and population diversity in this field.

      Finally, we revised the sentences that could potentially raise concerns for readers: 

      Lines 175-177: “To investigate the dissemination pattern of bvSP in China, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”  >  “To investigate the dissemination pattern of bvSP, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”

      (3) The manuscript uses nomenclature and classification into different sublineages. Did the authors establish the approaches for defining these sublineages in this group or did you follow the accepted standards? 

      Thank you very much for raising this important issue. The biovar types of Salmonella Gallinarum have historically been well-defined as S. Gallinarum biovar

      Pullorum (bvSP), S. Gallinarum biovar Gallinarum (bvSG), and S. Gallinarum biovar Duisburg (bvSD) (J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):214-8). However, there seems to be no widespread consensus on the population nomenclature for the key biovar bvSP. In a previous study, Zhou et al. classified bvSP into six lineages:

      L1, L2a, L2b, L3a, L3b, and L3c (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). However, our more comprehensive analysis of S. Gallinarum using a larger dataset and hierarchical Bayesian clustering revealed that L3a, previously considered a distinct lineage, is actually a sublineage of L3c. Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      (4) This reviewer is convinced with the analysis approaches and conclusion of this work.

      In the meantime, the authors are encouraged to discuss the application of the conclusion of this study: a) can the data be somehow used in the prediction model? b) would the conclusion from S. Gallinarum have generalized application values for other pathogens. 

      Thank you for your constructive comments on the manuscript. 

      a) can the data be somehow used in the prediction model?

      We believe that genomic data can be effectively used for constructing prediction models; however, the success of such models largely depends on the specific traits being predicted. In this study, we utilized a random forest prediction model based on 196 top genes (PLoS Genet. 2018 May 8;14(5)) to predict the invasiveness of 45 newly isolated strains. In relation to the antimicrobial resistance (AMR) issue discussed in this paper, we also conducted relevant analyses. For instance, we explored the use of image-based models to predict whether a genome is resistant to specific antibiotics (Comput Struct Biotechnol J. 2023 Dec 29:23:559-565). We are confident that the incorporation of newly generated data will facilitate the development of future predictive models, and we plan to pursue further research in this area.

      b) would the conclusion from S. Gallinarum have generalized application values for other pathogens.

      This might be explained from two perspectives. First, the key role of the mobilome in facilitating the spread of the resistome, as emphasized in this study, has also been confirmed in research on other pathogens (mBio. 2024 Oct 16;15(10):e0242824). Thus, we believe that the pipeline we developed to assess the horizontal transfer frequency of different resistance genes across regions applies to various pathogens. On the other hand, due to distinct evolutionary histories, different pathogens exhibit varying levels of adaptation to their environments. In this study, we found that S. Gallinarum tends to spread highly localized; however, this conclusion may not necessarily hold for other pathogens.

      Reviewer #2 (Recommendations for the authors): 

      The authors would need to: 

      (1) Address my concerns about genomic analyses listed in the public review. 

      Thank you for your valuable feedback. We have carefully reviewed your concerns and made the necessary revisions to address the points raised about genomic analyses in the public review. We sincerely hope that these modifications meet your expectations and provide more robust analysis. We appreciate your thoughtful input and remain open to further suggestions to improve the manuscript.

      (2) Add more detail on the genomic methods and their outputs, as suggested above. 

      We have added further details to clarify the methodologies and outputs as mentioned above. Specifically, we expanded the description of the data processing, and the bioinformatic tools used for analysis. To ensure clarity, we also included an expanded discussion of the key outputs, highlighting their implications. We hope these revisions meet your expectations.

      (3) Critically rewrite their introduction to make it clear what problem they are trying to address. 

      Thank you for your guidance. In the revised manuscript, we have made the necessary modifications to the Introduction section to more clearly articulate the problem we aim to address.

      (4) Critically rewrite their conclusions so they are supported by the data they present, and make it clear when claims are more speculative. 

      Thank you for your guidance. In the revised manuscript, we have made the recommended modifications to the relevant sections of the conclusion as outlined above.

      More minor issues I identified: 

      (1) Typo in the title 'avian-restrict'. 

      Done.

      Line 1: “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction.”

      (2) 'By utilizing the pipeline we developed' -- a pipeline has not been introduced at this point. 

      In the revised manuscript, we have removed this section from the 'Abstract'.

      Lines 46-48: “Notably, the mobilome-resistome combination among distinct lineages exhibits a geographical-specific manner, further supporting a localized endemic mobilome-driven process.”

      (3) 'has more than 90% serovars' -- doesn't make sense. 

      Revised.

      Lines 82-83: “Salmonella, a pathogen with distinct geographical characteristics, has more than 90% of its serovars frequently categorized as geo-serotypes.”

      (4) 'horrific mortality rates that remain a disproportionate burden'. 

      Revised.

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica Serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (5) What is the rate, what is a comparison, how is it disproportionate? 

      Thank you for your valuable feedback. It is challenging to accurately estimate the specific prevalence of S. Gallinarum, particularly due to the lack of comprehensive data in many countries. Numerous cases likely go unreported. However, S. Gallinarum is more commonly detected in low- and middle-income countries. Here, we provide three evidence supporting this observation. First, in our previous research, we conducted a comprehensive meta-analysis of 201 studies, involving over 900 million samples, to evaluate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). The estimated prevalence in 17 countries showed that Bangladesh had the highest rate (25.75%) of S. Gallinarum infections. However, for biovar Pullorum (bvSP), Argentina (20.69%) and China (18.18%) reported the highest prevalence rates. Second, previous studies have also reported that S. Gallinarum predominantly occurs in low- and middleincome countries (Vet Microbiol. 2019 Jan:228:165-172; BMC Microbiol. 2024 Oct 18;24(1):414). Finally, S. Gallinarum was once a globally prevalent pathogen in the 20th century. Following the implementation of eradication programs in most high-income countries, it was listed by the World Organization for Animal Health and subsequently became an endemic pathogen with sporadic outbreaks. However, similar eradication efforts are challenging to implement in low- and middle-income countries, leading to a disproportionately higher incidence of S. Gallinarum in these regions.

      In the revised manuscript, we have rephrased this sentence to enhance its accuracy:

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (6) 'we collected the most comprehensive set of 580 S. Gallinarum isolates', -> 'we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes'. 

      Revised.

      Lines 97-100: “To fill the gaps in understanding the evolution of S. Gallinarum under regional-associated AMR pressures and its adaptation to endemicity, we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes, spanning the period from 1920 to 2023.” 

      (7) Sequence reads are not available, and use a non-standard database. The eLife policy states: 'Sequence reads and assembly must be included for reference genomes, while novel short sequences, including epitopes, functional domains, genetic markers and haplotypes should be deposited, together with surrounding sequences, into Genbank, DNA Data Bank of Japan (DDBJ), or EMBL Nucleotide Sequence Database (ENA). DNA and RNA sequencing data should be deposited in NCBI Trace Archive or NCBI Sequence Read Archive (SRA).' So the sequences assemblies and reads should ideally be mirrored appropriately. 

      Thank you for your valuable suggestion regarding submitting the genome data for the newly isolated 45 S. Gallinarum strains. The genome data have been deposited in the NCBI Sequence Read Archive (SRA) under two BioProjects. The “SRA Accession number” for each strain have been added to New Supplementary Table 1. We believe this will ensure that the data are more readily accessible to a broader audience of researchers for download and analysis. We have revised the corresponding paragraph in the manuscript as follows:

      Lines 606-608: “For the newly isolated 45 strains of Salmonella Gallinarum, genome data have been deposited in NCBI Sequence Read Archive (SRA) database. The “SRA Accession” for each strain are listed in Supplementary Table 1.”

      (8) You should state at the start of the results which data is public, and how much is newly sequenced. 

      Revised.

      Lines 109-112: “To understand the global geographic distribution and genetic relationships of S. Gallinarum, we assembled the most comprehensive S. Gallinarum WGS dataset (n=580), comprising 535 publicly available genomes and 45 newly sequenced genomes.”

    1. eLife Assessment

      This valuable study tackles the well-established overflow metabolism issue by applying a coarse-grained metabolic flux model to predict how individual cells execute various energy strategies, such as respiration versus fermentation. The model's population average is convincing enough to align with experimental observations on overflow metabolism. However, the theoretical framework's reliance on single-cell growth rate variability must be questioned because of insufficient correlation with fluxes and the absence of regulatory mechanisms, highlighting the need for single-cell experimental validation to substantiate the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Cell metabolism exhibits a well-known behavior in fast-growing cells, which employ seemingly wasteful fermentation to generate energy even in the presence of sufficient environmental oxygen. This phenomenon is known as Overflow Metabolism or the Warburg effect in cancer. It is present in a wide range of organisms, from bacteria and fungi to mammalian cells.

      In this work, starting with a metabolic network for Escherichia coli based on sets of carbon sources, and using a corresponding coarse-grained model, the author applies some well-based approximations from the literature and algebraic manipulations. These are used to successfully explain the origins of Overflow Metabolism, both qualitatively and quantitatively, by comparing the results with E. coli experimental data.

      By modeling the proteome energy efficiencies for respiration and fermentation, the study shows that these parameters are dependent on the carbon source quality constants K_i (p.115 and 116). It is demonstrated that as the environment becomes richer, the optimal solution for proteome energy efficiency shifts from respiration to fermentation. This shift occurs at a critical parameter value K_A(C).<br /> This counter intuitive results qualitativelly explains Overflow Metabolism.

      Quantitative agreement is achieved through the analysis of the heterogeneity of the metabolic status within a cell population. By introducing heterogeneity, the critical growth rate is assumed to follow a Gaussian distribution over the cell population, resulting in accordance with experimental data for E. coli. Overflow metabolism is explained by considering optimal protein allocation and cell heterogeneity.

      The obtained model is extensively tested through perturbations: 1) Introduction of overexpression of useless proteins; 2) Studying energy dissipation; 3) Analysis of the impact of translation inhibition with different sub-lethal doses of chloramphenicol on Escherichia coli; 4) Alteration of nutrient categories of carbon sources using pyruvate. All model perturbations results are corroborated by E. coli experimental results.

      Strengths:

      In this work, the author effectively uses modeling techniques typical of Physics to address complex problems in Biology, demonstrating the potential of interdisciplinary approaches to yield novel insights. The use of Escherichia coli as a model organism ensures that the assumptions and approximations are well-supported in existing literature. The model is convincingly constructed and aligns well with experimental data, lending credibility to the findings. In this version, the extension of results from bacteria to yeast and cancer is substantiated by a literature base, suggesting that these findings may have broad implications for understanding diverse biological systems.

      Weaknesses:

      The author explores the generalization of their results from bacteria to cancer cells and yeast, adapting the metabolic network and coarse-grained model accordingly. In previous version this generalization was not completedly supported by references and data from the literature. This drawback, however, has been treated in this current version, where the authors discuss in much more detail and give references supporting this generalization.

    3. Reviewer #2 (Public review):

      In this version of manuscript, the author clarified many details and rewrote some sections. This substantially improved the readability of the paper. I also recognized that the author spent substantial efforts in the Appendix to answer the potential questions.

      Unfortunately, I am not currently convinced by the theory proposed in this paper. In the next section, I will first recap the logic of the author and explain why I am not convinced. Although the theory fits many experimental results, other theories on overflow metabolism are also supported by experiments. Hence, I do not think based on experimental data we could rule in or rule out different theories.

      Recap: To explain the origin of overflow metabolism, the author uses the following logic:

      (1) There is a substantial variability of single-cell growth rate<br /> (2) The flux (J_r^E) and (J_f^E) are coupled with growth rate by Eq. 3<br /> (3) Since growth rate varies from cells to cells, flux (J_r^E) and (J_f^E) also varies<br /> (4) The variabilities of above fluxes in above create threshold-analog relation, and hence overflow metabolism.

      My opinion:

      The logic step (2) and (3) have caveats. The variability of growth rate has large components of cellular noise and external noise. Therefore, variability of growth rate is far from 100% correlated with variability of flux (J_r^E) and (J_f^E) at the single-cell level. Single-cell growth rate is a complex, multivariate functional, including (Jr^E) and (J_f^E) but also many other variables. My feeling is the correlation could be too low to support the logic here.

      One example: ribosomal concentration is known to be an important factor of growth rate in bulk culture. However, the "growth law" from bulk culture cannot directly translate into the growth law at single-cell level [Ref1,2]. This is likely due to other factors (such as cell aging, other muti-stability of cellular states) are involved.

      Therefore, I think using Eq.3 to invert the distribution of growth rate into the distribution of (Jr^E) and (J_f^E) is inapplicable, due to the potentially low correlation at single-cell level. It may show partial correlations, but may not be strong enough to support the claim and create fermentation at macroscopic scale.

      Overall, if we track the logic flow, this theory implies overflow metabolism is originated from variability of k_cat of catalytic enzymes from cells to cells. That is, the author proposed that overflow metabolism happens macroscopically as if it is some "aberrant activation of fermentation pathway" at the single-cell level, due to some unknown partially correlation from growth rate variability.

      Compared with other theories, this theory does not involve any regulatory mechanism and can be regarded as a "neutral theory". I am looking forward to seeing single cell experiments in the future to provide evidences about this theory.

      [Ref1] https://www.biorxiv.org/content/10.1101/2024.04.19.590370v2<br /> [Ref2] https://www.biorxiv.org/content/10.1101/2024.10.08.617237v2

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Cell metabolism exhibits a well-known behavior in fast-growing cells, which employ seemingly wasteful fermentation to generate energy even in the presence of sufficient environmental oxygen. This phenomenon is known as Overflow Metabolism or the Warburg effect in cancer. It is present in a wide range of organisms, from bacteria and fungi to mammalian cells.

      In this work, starting with a metabolic network for Escherichia coli based on sets of carbon sources, and using a corresponding coarse-grained model, the author applies some well-based approximations from the literature and algebraic manipulations. These are used to successfully explain the origins of Overflow Metabolism, both qualitatively and quantitatively, by comparing the results with E. coli experimental data.

      By modeling the proteome energy efficiencies for respiration and fermentation, the study shows that these parameters are dependent on the carbon source quality constants K_i (p.115 and 116). It is demonstrated that as the environment becomes richer, the optimal solution for proteome energy efficiency shifts from respiration to fermentation. This shift occurs at a critical parameter value K_A(C).

      This counterintuitive result qualitatively explains Overflow Metabolism.

      Quantitative agreement is achieved through the analysis of the heterogeneity of the metabolic status within a cell population. By introducing heterogeneity, the critical growth rate is assumed to follow a Gaussian distribution over the cell population, resulting in accordance with experimental data for E. coli. Overflow metabolism is explained by considering optimal protein allocation and cell heterogeneity.

      The obtained model is extensively tested through perturbations: 1) Introduction of overexpression of useless proteins; 2) Studying energy dissipation; 3) Analysis of the impact of translation inhibition with different sub-lethal doses of chloramphenicol on Escherichia coli; 4) Alteration of nutrient categories of carbon sources using pyruvate. All model perturbation results are corroborated by E. coli experimental results.

      We appreciate the reviewer's highly positive comments and the accurate summary of our manuscript.

      Strengths:

      In this work, the author employs modeling methods typical of Physics to address a problem in Biology, standing at the interface between these two scientific fields. This interdisciplinary approach proves to be highly fruitful and should be further explored in the literature. The use of Escherichia coli as an example ensures that all hypotheses and approximations in this study are well-founded in the literature. Examples include the approximation for the Michaelis-Menten equation (line 82), Eq. S1, proteome partition in Appendix 1.1 (lines 68-69), and a stable nutrient environment in Appendix 1.1 (lines 83-84). The section "Testing the model through perturbation" heavily relies on bacterial data. The construction of the model and its agreement with experimental data are convincingly presented.

      We appreciate the reviewer's highly positive comments. We have incorporated many of the reviewer's insightful suggestions and added citations in the appropriate contexts, which have significantly improved our manuscript.

      Weaknesses:

      In Section Appendix 6.4, the author explores the generalization of results from bacteria to cancer cells, adapting the metabolic network and coarse-grained model accordingly. It is argued that as a consequence, all subsequent steps become immediately valid. However, I remain unconvinced, considering the numerous approximations used to derive the equations, which the literature demonstrates to be valid primarily for bacteria. A more detailed discussion about this generalization is recommended. Additionally, it is crucial to note that the experimental validation of model perturbations heavily relies on E. coli data.

      We appreciate the reviewer's insightful suggestions. We apologize for not clearly illustrating the generalization of results from bacteria to cancer cells in the previous version of our manuscript. Indeed, in our earlier version, there was no experimental validation of model results related to cancer cells.

      Following the reviewer’s suggestions, we have now added Fig. 5 and Appendix-fig. 5, fully expanded the previous Appendix 6.4 into Appendix 9 in our current version, and added a new section entitled “Explanation of the Crabtree effect in yeast and the Warburg effect in cancer cells” in our main text to provide a detailed discussion of the generalization from bacteria to yeast and cancer cells. Through the derivations shown in Appendix 9 (Eqs. S180-S189), we arrived at Eq. 6 (or Eq. S190 in Appendix 9) to facilitate the comparison of our model results with experimental data in yeast and cancer cells. This comparison is presented in Fig. 5, where we demonstrate that our model can quantitatively explain the data for the Crabtree effect in yeast and the Warburg effect in cancer cells (related experimental data references: Shen et al., Nature Chemical Biology 20, 1123–1132 (2024); Bartman et al., Nature 614, 349-357 (2023)). These additions have significantly strengthened our manuscript.

      Reviewer #2 (Public Review):

      Summary

      This paper has three parts. The first part applied a coarse-grained model with proteome partition to calculate cell growth under respiration and fermentation modes. The second part considered single-cell variability and performed population average to acquire an ensemble metabolic profile for acetate fermentation. The third part used model and simulation to compare experimental data in literature and obtained substantial consistency.

      We thank the reviewer for the accurate summary and positive comments on our manuscript.

      Strengths and major contributions

      (i) The coarse-grained model considered specific metabolite groups and their interrelations and acquired an analytical solution for this scenario. The "resolution" of this model is in between the Flux Balanced Analysis/whole-cell simulation and proteome partition analysis.

      (ii) The author considered single-cell level metabolic heterogeneity and calculated the ensemble average with explicit calculation. The results are consistent with known fermentation and growth phenomena qualitatively and can be quantitatively compared to experimental results.

      We appreciate the reviewer’s highly positive comments.

      Weaknesses

      (i) If I am reading this paper correctly, the author's model predicts binary (or "digital") outcomes of single-cell metabolism, that is, after growth rate optimization, each cell will adopt either "respiration mode" or "fermentation mode" (as illustrated in Figure Appendix - Figure 1 C, D). Due to variability enzyme activity k_i^{cat} and critical growth rate λ_C, each cell under the same nutrient condition could have either respiration or fermentation, but the choice is binary.

      The binary choice at the single-cell level is inconsistent with our current understanding of metabolism. If a cell only uses fermentation mode (as shown in Appendix - Figure 1C), it could generate enough energy but not be able to have enough metabolic fluxes to feed into the TCA cycle. That is, under pure fermentation mode, the cell cannot expand the pool of TCA cycle metabolites and hence cannot grow.

      This caveat also appears in the model in Appendix (S25) that assumes J_E = r_E*J_{BM} where r_E is a constant. From my understanding, r_E can be different between respiration and fermentation modes (at least for real cells) and hence it is inappropriate to conclude that cells using fermentation, which generates enough energy, can also generate a balanced biomass.

      We thank the reviewer for raising this question. Indeed, regarding energy biogenesis between respiration and fermentation, our model predicts binary outcomes at the single-cell level. However, this outcome does not hinder cell growth, as there are three independent possible fates for the carbon source (e.g., glucose) in metabolism: fermentation, respiration for energy biogenesis, and biomass generation. Each fate is associated with a distinct fraction of the proteome, with no overlap between them (see Appendix-figs. 1 and 5). Consequently, in a purely fermentative mode, a cell can still use the proteome dedicated to the biomass generation pathway to produce biomass precursors via the TCA cycle.

      The classification of the carbon source’s fates into three independent pathways was initially introduced by Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)). We apologize for the oversight in not citing their paper in this context in the previous version of our manuscript (although it was cited elsewhere). We have now included the citation in all appropriate places.

      To illustrate this issue more clearly, we explicitly present the proteome allocation results for optimal growth in a fermentation mode below, where the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in fermentation is higher than in respiration (i.e., ). We use the model shown in Fig. 1B as an example, with the relevant equations being Eqs. S26 and S28 in Appendix 2.1. By substituting Eq. S28 into Eq. S26, we arrive at Eq. 3 (or Eq. S29 in Appendix 2.1), which we restate here as Eq. R1:

      For a given nutrient condition, i.e., for a specific value of κ<sub>A</sub> at the single-cell level, the values of are determined (see Eqs. S20, S27, S31 and S32), while  ϕ and φ<sub>max</sub> are constants (see Eq. S33 and Appendix 1.1). Therefore, if , then , since all coefficients are positive (i.e., ) and takes non-negative values. Hence, the solution for optimal growth is (see Eqs. S35-S36 in Appendix 2.2):

      Here, the result signifies a pure fermentation mode with no respiration flux for energy biogenesis. Then, by combining Eq. R2 with Eqs. S28 and S30 from Appendix 2.1, we obtain the optimal proteome allocation results for this case:

      where , while κ<sub>A</sub> and take given values (see Eqs. S20 and S27). In Eq. R3, φ<sub>3</sub> corresponds to the fraction of the proteome devoted to carrying the carbon flux from Acetyl-CoA (the entry point of Pool b, see Fig. 1B and Appendix 1.2) to α-Ketoglutarate (the entry point of Pool c), with all of these being enzymes within the TCA cycle. The optimal growth solution is , which demonstrates that in a pure fermentation mode, the optimal growth condition includes the presence of enzymes within the TCA cycle capable of carrying the flux required for biomass generation.

      Regarding Eq. S25, J<sub>E</sub> represents the energy demand for cell proliferation, expressed as the stoichiometric energy flux in ATP. Although the influx of carbon sources (e.g., glucose) varies significantly between fermentation and respiration modes, J<sub>BM</sub> and J<sub>E</sub>  are the biomass and energy fluxes used to build cells, respectively. In bacteria, whether in fermentation or respiration mode, the proportion of maintenance energy used for protein degradation is roughly negligible (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Consequently, the energy demand represented by J_E scales approximately linearly with the biomass production rate _J<sub>BM</sub> (related experimental data reference: Ebenhöh et al., Life 14, 247 (2024)), regardless of the energy biogenesis mode. Therefore, _r_E can be regarded as roughly constant for bacteria. However, in eukaryotic cells such as yeast and mammalian cells, the proportion of maintenance energy is much more significant (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Therefore, we have explicitly considered the contribution of maintenance energy in these cases and have extended the previous Appendix 6.4 into Appendix 9 in the current version.

      (ii) The minor weakness of this model is that it assumes a priori that each cell chooses its metabolic strategy based on energy efficiency. This is an interesting assumption but there is no known biochemical pathway that directly executes this mechanism. In evolution, growth rate is more frequently considered for metabolic optimization. In Flux Balanced Analysis, one could have multiple objective functions including biomass synthesis, energy generation, entropy production, etc. Therefore, the author would need to justify this assumption and propose a reasonable biochemical mechanism for cells to sense and regulate their energy efficiency.

      We thank the reviewer for raising this question and apologize for not explaining this point clearly enough in the previous version of our manuscript. Just as the reviewer mentioned, growth rate should be considered for metabolic optimization under the selection pressure of the evolutionary process. In fact, in our model, the sole optimization objective is exactly the cell growth rate. The determination of whether to use fermentation or respiration based on proteome efficiency (i.e., the proteome energy efficiency in our previous version) is not an a priori assumption in our model; rather, it is a natural consequence of growth rate optimization, as we detail below. 

      For a given nutrient condition with a determined value of κ<sub>A</sub> , as we have explained in the aforementioned responses, the constraint on the fluxes is summarized in Eq. 3 and is restated as Eq. R1. Mathematically, we can obtain the solution for the optimal growth strategy by combining Eq. R1 (i.e., Eq. 3) with the optimization on cell growth rate λ, and the solution can be obtained as follows: If the proteome efficiency in fermentation is larger than that in respiration, i.e., , then from Eq. R1, we obtain , since the values of ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ and φ<sub>max</sub> are all fixed for a given κ_A_ , with ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ, φ<sub>max</sub> > 0 . Hence, (since ), and note that . Therefore is the solution for optimal growth, where the growth rate can take the maximum value of . Similarly, for the case where the proteome efficiency in respiration is larger than that in fermentation (i.e ), is the solution for optimal growth. With this analysis, we have demonstrated that the choice between fermentation and respiration based on proteome efficiency is a natural consequence of growth rate optimization.

      We have now revised the related content in our manuscript to clarify this point.

      My feeling is that the mathematical structure of this model could be correct, but the single-cell interpretation for the ensemble averaging has issues. Each cell could potentially adopt partial respiration and partial fermentation at the same time and have temporal variability in its metabolic mode as well. With the modification of the optimization scheme, the author could have a revised model that avoids the caveat mentioned above.

      We thank the reviewer for raising this question. In fact, in the above two responses, we have addressed the issues raised here, clarifying that the binary mode between respiration and fermentation does not hinder cell growth and that the sole optimization objective is the cell growth rate, as the reviewer suggested. Regarding temporal variability, due to factors such as cell cycle stages and the intrinsic noise arising from stochastic processes, temporal variability in the fermentation or respiration mode is indeed likely. However, at any given moment at the single-cell level, a binary choice between fermentation and respiration is what our model predicts for the optimal growth strategy. 

      Discussion and impact for the field

      Proteome partition models and Flux Balanced Analysis are both commonly used mathematical models that emphasize different parts of cellular physiology. This paper has ingredients for both, and I expect after revision it will bridge our understanding of the whole cell.

      We appreciate the reviewer’s very positive comments. We have followed many of the good suggestions raised by the reviewer, and our revised manuscript is much improved as a result.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript "Overflow metabolism originates from growth optimization and cell heterogeneity" the author Xin Wang investigates the hypothesis that the transition into overflow metabolism at large growth rates actually results from an inhomogeneous cell population, in which every individual cell either performs respiration or fermentation.

      We thank the reviewer for carefully reading our manuscript and the accurate summary.

      Weaknesses:

      The paper has several major flaws. First, and most importantly, it repeatedly and wrongly claims that the origins of overflow metabolism are not known. The paper is written as if it is the first to study overflow metabolism and provide a sound explanation for the experimental observations. This is obviously not true and the author actually cites many papers in which explanations of overflow metabolism are suggested (see e.g. Basan et al. 2015, which even has the title "Overflow metabolism in E. coli results from efficient proteome allocation"). The paper should be rewritten in a more modest and scientific style, not attempting to make claims of novelty that are not supported. In fact, all hypotheses in this paper are old. Also the possiblility that cell heterogeneity explains the observed 'smooth' transition into overflow metabolism has been extensively investigated previously (see de Groot et al. 2023, PNAS, "Effective bet-hedging through growth rate dependent stability") and the random drawing of kcat-values is an established technique (Beg et al., 2007, PNAS, "Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity"). Thus, in terms of novelty, this paper is very limited. It reinvents the wheel and it is written as if decades of literature debating overflow metabolism did not exist.

      We thank the reviewer for both the critical and constructive comments. Following the reviewer’s suggestion, we have revised our manuscript to adopt a more modest style. However, we respectfully disagree with the criticism regarding the novelty of our study, as detailed below.

      First, while many explanations for overflow metabolism have been proposed, we have cited these in both the previous and current versions of our manuscript. We apologize for not emphasizing the distinctions between these previous explanations and our study in the main text of our earlier version, though we did provide details in Appendix 6.3. In fact, most of these explanations (e.g., Basan et al., Nature 528, 99-104 (2015); Chen and Nielsen, PNAS 116, 17592-17597 (2019); Majewski and Domach, Biotechnol. Bioeng. 35, 732-738 (1990); Niebel et al., Nat. Metab. 1, 125-132 (2019); Shlomi et al., PLoS Comput. Biol. 7, e1002018 (2011); Varma and Palsson, Appl. Environ. Microbiol. 60, 3724-3731 (1994); Vazquez et al., BMC Syst. Biol. 4, 58 (2010); Vazquez and Oltvai, Sci. Rep. 6, 31007 (2016); Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)) heavily rely on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or certain equivalents) to explain the growth rate dependence of fermentation flux. However, this assumption—that cell growth rate is optimized for a given rate of carbon influx—is questionable, as the given factors in a nutrient condition are the identity and concentration of the carbon source, rather than the carbon influx itself.

      Consequently, in our model, we purely optimize cell growth rate without imposing a special constraint on carbon influx. Our assumption that the given factors in a nutrient condition are the identity and concentration of the carbon source aligns with the studies by Molenaar et al. (Molenaar et al., Mol. Syst. Biol. 5, 323 (2009)), where they specified an identical assumption on page 5 of their Supplementary Information (SI); Scott et al. (Scott et al., Science 330, 1099-1102 (2010)), where the growth rate formula was derived for a culturing condition with a given nutrient quality; and Wang et al. (Wang et al., Nat. Comm. 10, 1279 (2019)), our previous study on microbial growth. Among these three studies, only Molenaar et al. addresses overflow metabolism. However, Molenaar et al. did not consider cell heterogeneity, resulting in their model predictions on the growth rate dependence of fermentation flux being a digital response, which is inconsistent with experimental data.

      Furthermore, prevalent explanations such as those by Basan et al. (Basan et al., Nature 528, 99-104 (2015)) and Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)) suggest that overflow metabolism originates from the proteome efficiency in fermentation always being higher than in respiration. However, Shen et al. (Shen et al., Nature Chemical Biology 20, 1123–1132 (2024)) recently discovered that the proteome efficiency measured at the cell population level in respiration is higher than in fermentation for many yeast and cancer cells, despite the presence of fermentation fluxes through aerobic glycolysis. This finding clearly contradicts the studies by Basan et al. (2015) and Chen and Nielsen (2019). 

      Nevertheless, our model may resolve this puzzle by incorporating two important features. First, in our model, the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in respiration is larger than that in fermentation when nutrient quality is low (Eqs. S174-S175 in Appendix 9). Second, and crucially, due to the incorporation of cell heterogeneity in our model, there could be a proportion of cells with higher proteome efficiency in fermentation than in respiration, even when the overall proteome efficiency at the cell population level is higher in respiration than in fermentation. As shown in the newly added Fig. 5A-B, our model results can quantitatively illustrate the experimental data from Shen et al., Nature Chemical Biology 20, 1123–1132 (2024).

      Finally, regarding the criticism of the novelty of our hypothesis: As specified in our main text, cell heterogeneity has been widely reported experimentally in both microbes (e.g., Ackermann, Nat. Rev. Microbiol. 13, 497-508 (2015); Bagamery et al., Curr. Biol. 30, 4563-4578 (2020); Balaban et al., Science 305, 1622-1625 (2004); Nikolic et al., BMC Microbiol. 13, 1-13 (2013); Solopova et al., PNAS 111, 7427-7432 (2014); Wallden et al., Cell 166, 729-739 (2016)) and tumor cells (e.g., Duraj et al., Cells 10, 202 (2021); Hanahan and Weinberg, Cell 164, 681-694 (2011); Hensley et al., Cell 164, 681-694 (2016)). However, to the best of our knowledge, cell heterogeneity has not yet been incorporated into theoretical models for explaining overflow metabolism or the Warburg effect. The reviewer mentioned the study by de Groot et al. (de Groot et al., PNAS 120, e2211091120 (2023)) as studying overflow metabolism similarly to our work. We have carefully read this paper, including the main text and SI, and found that it is not directly relevant to either overflow metabolism or the Warburg effect. Instead, their model extends the work of Kussell and Leibler (Kussell and Leibler, Science 309, 2075-2078 (2005)), focusing on bet-hedging strategies of microbes in changing environments.

      Regarding the criticism that random drawing of kcat-values is an established technique (Beg et al., PNAS 104, 12663-12668 (2007)), we need to stress that the distribution noise on kcat-values considered in our model is fundamentally different from that in Beg et al. In Beg et al., their model involved 876 reactions (see Dataset 1 in Beg et al.), of which only 109 had associated biochemical experimental data. Thus, their distribution of kcat-values pertains to different enzymes within the same cell. In contrast, we have the mean of the kcat-values from experimental data for each relevant enzymes, with the distribution of kcat-values representing the same enzyme in different cells.           

      Moreover, the manuscript is not clearly written and is hard to understand. Variables are not properly introduced (the M-pools need to be discussed, fluxes (J_E), "energy coefficients" (eta_E), etc. need to be more explicitly explained. What is "flux balance at each intermediate node"? How is the "proteome efficiency" of a pathway defined? The paper continues to speak of energy production. This should be avoided. Energy is conserved (1st law of thermodynamics) and can never be produced. A scientific paper should strive for scientific correctness, including precise choice of words.

      We thank the reviewer for the constructive comments. Following these, we have provided more explicit information and revised our manuscript to enhance readability. In our initially submitted version, the phrase "energy production" was borrowed from Nelson et al. (Nelson et al., Lehninger principles of biochemistry, 2008) and Basan et al. (Basan et al., Nature 528, 99-104 (2015)), and we chose to follow this terminology. We appreciate the reviewer’s suggestion and have now revised the wording to use more appropriate expressions.

      The statement that the "energy production rate ... is proportional to the growth rate" is, apart from being incorrect - it should be 'ATP consumption rate' or similar (see above), a non-trivial claim. Why should this be the case? Such statements must be supported by references. The observation that the catabolic power indeed appears to increase linearly with growth rate was made, based on chemostat data for E.coli and yeast, in a recent preprint (Ebenhöh et al, 2023, bioRxiv, "Microbial pathway thermodynamics: structural models unveil anabolic and catabolic processes").

      We thank the reviewer for the insightful suggestions. Following these, we have revised our manuscript and cited the suggested reference (i.e., Ebenhöh et al., Life 14, 247 (2024)).

      All this criticism does not preclude the possibility that cell heterogeneity plays a role in overflow metabolism. However, according to Occam's razor, first the simpler explanations should be explored and refuted before coming up with a more complex solution. Here, it means that the authors first should argue why simpler explanations (e.g. the 'Membrane Real Estate Hypothesis', Szenk et al., 2017, Cell Systems; maximal Gibbs free energy dissipation, Niebel et al., 2019, Nature Metabolism; Saadat et al., 2020, Entropy) are not considered, resp. in what way they are in disagreement with observations, and then provide some evidence of the proposed cell heterogeneity (are there single-cell transcriptomic data supporting the claim?).

      We thank the reviewer for raising these questions and providing valuable insights. Regarding the shortcomings of simpler explanations, as explained above, most proposed explanations (including the references mentioned by the reviewer: Szenk et al., Cell Syst. 5, 95-104 (2017); Niebel et al., Nat. Metab. 1, 125-132 (2019); Saadat et al., Entropy 22, 277 (2020)) rely heavily on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or its equivalents). However, this assumption is questionable, as the given factors in a nutrient condition are the identities and concentrations of the carbon sources, rather than the carbon influx itself.

      Specifically, Szenk et al. is a perspective paper, and the original “membrane real estate hypothesis” was proposed by Zhuang et al. (Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)). Zhuang et al. specified in Section 7 of their SI that their model’s explanation of the experimental results shown in Fig. 2C of their manuscript relies on the assumption of restrictions on carbon influx. In Niebel et al. (Niebel et al., Nat. Metab. 1, 125-132 (2019)), the Methods section specifies that the glucose uptake rate was considered a given factor for a growth condition. In Saadat et al. (Saadat et al., Entropy 22, 277 (2020)), Appendix A notes that their model results depend on minimizing carbon influx for a given growth rate, which is equivalent to the assumption mentioned above (see Appendix 6.3 in our manuscript for details). 

      Regarding the experimental evidence for our proposed cell heterogeneity, Bagamery et al. (Bagamery et al., Curr. Biol. 30, 4563-4578 (2020)) reported non-genetic heterogeneity in two subpopulations of Saccharomyces cerevisiae cells upon the withdrawal of glucose from exponentially growing cells. This strongly indicates the coexistence of fermentative and respiratory modes of heterogeneity in S. cerevisiae cultured in a glucose medium (refer to Fig. 1E in Bagamery et al.). Nikolic et al. (Nikolic et al., BMC Microbiol. 13, 1-13 (2013)) reported a bimodal distribution in the expression of the acs gene (the transporter for acetate) in an E. coli cell population growing on glucose as the sole carbon source within the region of overflow metabolism (see Fig. 5 in Nikolic et al.), indicating the cell heterogeneity we propose. For cancer cells, Duraj et al. (Duraj et al., Cells 10, 202 (2021)) reported a high level of intra-tumor heterogeneity in glioblastoma using optical microscopy images, where 48%~75% of the cells use fermentation and the remainder use respiration (see Fig. 1C in Duraj et al.), which aligns with the cell heterogeneity picture of aerobic glycolysis predicted by our model.   

      We have now added related content to the discussion section to strengthen our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Some minor corrections:

      (1) Adjusted the reference: (García-Contreras et al., 2012)

      (2) Corrected line 255: Removed the duplicate "the genes"

      We thank the reviewer for the suggestions and have implemented each of them to revise our manuscript. The reference in the form of García-Contreras et al., 2012, although somewhat unusual, is actually correct, so we have kept it unchanged.

      General comment to the author:

      Considering that this work exists at the interface between Physics and Biology, where a significant portion of the audience may not be familiar with the mathematical manipulations performed, it would enhance the paper's readability to provide more explicit indications in the text. For example, in line 91, explicitly define phi_A as phi_R; or in line 115, explain the K_i parameter in the text for better readability.

      We thank the reviewer for the suggestion. Following this, we have now provided more explicit information for the definition of mathematical symbols to enhance readability.

      Reviewer #2 (Recommendations For The Authors):

      The current form of this manuscript is difficult to read for general readers. In addition, the model description in the Appendix can be improved for biophysics readers to keep track of the variables. Here are my suggestions:

      a) In the main text, the author should give the definition of "proteome energy efficiency" explicitly both in English and mathematical formula - since this is the central concept of the paper. The biological interpretation of formula (4) should also be stated.

      We thank the reviewer for the suggestion. Following this, we have now added definitions and biological interpretations to fix these issues.

      b) I feel the basic model of the reaction network in the Appendix could be stated in a more concise way, by emphasizing whether a variable is extensive (exponential growing) or intensive (scale-invariant under exponential growth).

      From my understanding, this work assumes balanced exponential growth and hence there is a balanced biomass vector Y* (a constant unit vector with all components sum to 1) for each cell. The steady-state fluxes {J} are extensive and all have growth rate λ. The proteome partition and relative metabolite fractions are ratios of different components of Y* and hence are intensive.

      The normalized fluxes {J^(n)} (with respect to biomass) are a function of Y* and are all kept as constant ratios with each other. They are also intensive.

      The biomass and energy production are linear combinations of {J} and hence are extensive and follow exponential growth. The biomass and energy efficiency are ratios between flux and proteome biomass, and hence are intensive.

      We thank the reviewer for the insightful suggestion. Following this, we have now added the intensive and extensive information for all relevant variables in the newly added Appendix-table 3.

      c) In the Appendix, the author should have a table or list of important variables, with their definition, units, and physiological values under respiration and fermentation.

      We thank the reviewer for the very useful suggestion. Following this, we have now added Appendix-table 3 (pages 54-57 in the appendices) to illustrate the symbols used throughout our manuscript, as well as the model variables and parameter settings.   

      d) Regarding the single-cell variability, the author ignored recent experimental measurements on single-cell metabolism. This includes variability on ATP, NAD(P)H in E. coli, which will be useful background for the readers, see below.

      https://pubmed.ncbi.nlm.nih.gov/25283467/

      https://pubmed.ncbi.nlm.nih.gov/29391569/

      We thank the reviewer for the very useful suggestion. We have now cited these relevant studies in our manuscript.  

      e) The choice between 100% respiration and 100% fermentation is based on the optimization of proteome energy efficiency, while the intermediate strategies are not favored in this model. This is similar to a concept in control theory called the bang-bang principle. This can be added to the Discussion.

      We thank the reviewer for this suggestion. We have reviewed the concept and articles on the bang-bang principle. While the bang-bang principle is indeed relevant to binary choices, it is somewhat distant from the topic of metabolic strategies related to optimal growth. The elementary flux mode (see Müller et al., J. Theor. Biol. 347, 182190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) is more pertinent to this topic, as it may lead to diauxic microbial growth (another binary metabolic strategy) in microbes grown on a mixture of two carbon sources from Group A (see Wang et al., Nat. Comm. 10, 1279 (2019)). Therefore, we have cited and mentioned only the elementary flux mode (Müller et al., J. Theor. Biol. 347, 182-190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) in the introduction and discussion sections of our manuscript.

    1. eLife Assessment

      This important study explores the association between mother-child interactions and the development of children's social brain networks, specifically the theory of mind and social pain networks. The findings provide solid evidence for enhanced stimulus-evoked neural synchronization between child-caregiver dyads, while the evidence for the other variables is incomplete and could be strengthened with further analyses. The study effectively bridges brain development with children's behavior and parenting practices and would be of interest to broad research communities in social neuroscience and developmental psychology.

    2. Reviewer #1 (Public review):

      The authors sought to examine the associations between child age, reports of parent-child relationship quality, and neural activity patterns while children (and also their parents) watched a movie clip. Major methodological strengths include the sample of 3-8 year-old children in China (rare in fMRI research for both age range and non-Western samples), use of a movie clip previously demonstrated to capture theory of mind constructs at the neural level, measurement of caregiver-child neural synchrony, and assessment of neural maturity. Results provide important new information about parent-child neural synchronization during this movie and associations with reports of parent-child relationship quality. The work is a notable advance in understanding the link between the caregiving context and the neural construction of theory of mind networks in the developing brain.

      There are several theoretical and methodological limitations of the manuscript in its current form:

      (1) We appreciate that the authors wanted to show support for a mediational mechanism. However, we suggest that the authors drop the structural equation modeling because the data are cross-sectional so mediation is not appropriate. Other issues include the weak justification of including the parent-child neural synchronization as part of parenting.... it could just as easily be a mechanism of change or driven by the child rather than a component of parenting behavior. The paper would be strengthened by looking at associations between selected variables of interest that are MOST relevant to the imaging task in a regression type of model. Furthermore, the authors need to be more explicit about corrections for multiple comparisons throughout the manuscript; some of the associations are fairly weak so claims may need to be tempered if they don't survive correction.

      (2) Reverse correlation analysis is sensible given what prior developmental fMRI studies have done. But reverse correlation analysis may be more prone to overfitting and noise, and lacks sensitivity to multivariate patterns. Might inter-subject correlation be useful for *within* the child group? This would minimize noise and allow for non-linear patterns to emerge.

      (3) No learning effects or temporal lagged effects are tested in the current study, so the results do not support the authors' conclusions that the data speak to Bandura's social learning theory. The authors do mention theories of biobehavioral synchrony in the introduction but do not discuss this framework in the discussion (which is most directly relevant to the data). The data can also speak to other neurodevelopmental theories of development (e.g.,neuroconstructivist approaches), but the authors do not discuss them. The manuscript would benefit from significantly revising the framework to focus more on biobehavioral synchrony data and other neurodevelopmental approaches given the prior work done in this area rather than a social psychology framework that is not directly evaluated.

      (4) The significance and impact of the findings would be clearer if the authors more clearly situated the findings in the context of (a) other movie and theory of mind fMRI task data during development; and (b) existing data on parent-child neural synchrony (often uses fNIRS or EEG). What principles of brain and social cognition development do these data speak to? What is new?

      (5) There is little discussion about the study limitations, considerations about the generalizability of the findings, and important next steps and future directions. What can the data tell us, and what can it NOT tell us?

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.

      Strengths:

      This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM).

      Weaknesses:

      (1) Upon reviewing the introduction, I feel that the first goal - developmental changes of the social brain and its relation to age - seems somewhat distinct from the other two goals and the main research question of the manuscript. The authors might consider revising this section to enhance the overall coherence of the manuscript. Additionally, the introduction lacks a clear background and rationale for the importance of examining age-related changes in the social brain.

      (2) The manuscript uses both "mother-child" and "parent-child" terminology. Does this imply that only mothers participated in the fMRI scans while fathers completed the questionnaires? If so, have the authors considered the potential impact of parental roles (father vs. mother)?

      (3) There is inconsistent usage of the terms ISC and ISS in the text and figures, both of which appear to refer to synchronization derived from correlation analysis. It would be beneficial to maintain consistency throughout the manuscript.

      (4) Of the 50 dyads, 16 were excluded due to data quality issues, which constitutes a significant proportion. It would be helpful to know whether these excluded dyads exhibited any distinctive characteristics. Providing information on demographic or behavioral differences-such as Theory of Mind (ToM) performance and age range between the excluded and included dyads would enhance the assessment of the findings' generalizability.

      (5) The article does not adhere to the standard practice of using a resting state as a baseline for subtracting from task synchronization. Is there a rationale for this approach? Not controlling for a baseline may lead to issues, such as whether resting state synchronization already differs between subjects with varying characteristics.

      (6) The title of the manuscript suggests a direct influence of mother-child interactions on children's social brain and theory of mind. However, the use of structural equation modeling (SEM) may not fully establish causal relationships. It is possible that the development of children's social brain and ToM also enhances mother-child neural synchronization. The authors should address this alternative hypothesis of the potential bidirectional relationship in the discussion and exercise caution regarding terms that imply causality in the title and throughout the manuscript.

      (7) I would appreciate more details about the 14 Theory of Mind (ToM) tasks, which could be included in supplemental materials. The authors score them on a scale from 0 to 14 (each task 1 point); however, the tasks likely vary in difficulty and should carry different weights in the total score (for example, the test and the control questions should have different weights). Many studies have utilized the seven tasks according to Wellman and Liu (2004), categorizing them into "basic ToM" and "advanced ToM." Different components of ToM could influence the findings of the current study, which should be further examined by a more in-depth analysis.

    4. Reviewer #3 (Public review):

      Summary:

      The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.

      Strengths:

      This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions. However, I have some concerns regarding the analysis and interpretation of the findings. I have outlined these concerns below in the order they appear in the manuscript, which I hope will be helpful for the revision.

      Weaknesses:

      (1) Given the importance of social cognition in this study, please cite a foundational empirical or review paper on social cognition to support its definition. The current first citation is primarily related to ASD research, which may not fully capture the broader context of social cognition development.

      (2) It is standard practice to report the final sample size in the Abstract and Introduction, rather than the initial recruited sample, as high attrition rates are common in pediatric studies. For example, this study recruited 50 mother-child dyads, and only 34 remained after quality control. This information is crucial for interpreting the results and conclusions. I recommend reporting the final sample size in the abstract and introduction but specifying in the Methods that an additional 16 mother-child dyads were initially recruited or that 50 dyads were originally collected.

      (3) In the "Neural maturity reflects the development of the social brain" section, the authors report the across-network correlation for adults, finding a negative correlation between ToM and SPM. However, the cross-network correlations for the three child groups are not reported. The statement that "the two networks were already functionally distinct in the youngest group of children we tested" is based solely on within-network positive correlations, which does not fully demonstrate functional distinctness. Including cross-network correlations for the child groups would strengthen this conclusion.

      (4) The ROIs for the ToM and SPM networks are defined based on previous literature, applying the same ROIs across all age groups. While I understand this is a common approach, it's important to note that this assumption may not fully hold, as network architecture can evolve with age. The functional ROIs or components of a network might shift, with regions potentially joining or exiting a network or changing in size as children develop. For instance, Mark H. Johnson's interactive specialization theory suggests that network composition may adapt over developmental stages. Although the authors follow the approach of Richardson et al. (2018), it would be beneficial to discuss this limitation in the Discussion. An alternative approach would be to apply data-driven analysis to justify the selection of the ROIs for the two networks.

      (5) The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size. I suggest discussing this limitation in more detail in the Discussion.

      (6) Based on the above comment, I believe that conclusions regarding the relationship between social network development, parenting, and support for Bandura's theory should be tempered. The current conclusions may be too strong given the study's limitations.

      (7) The SPM (pain) network is associated with empathic abilities, also an important aspect of social skills. It would be relevant to explore whether (or explain why) SPM development and child-mother synchronization are (or are not) related to parenting and the parent-child relationship.

    1. eLife Assessment

      NeuroSCAN is an accessible and interactive tool for streamlined observation of neuronal morphology, membrane contact, and synaptic connectivity across developmental stages in the nematode C. elegans. This important tool relies on solid electron microscopy datasets. This resource will be of high interest to C. elegans researchers interested in nervous system wiring and circuit function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present NeuroSCAN, an accessible and interactive tool for visualizing and summarizing data from multiple previously annotated C. elegans connectomes. NeuroSCAN provides a useful entry point for streamlined observation of neuronal morphology, and the membrane contacts and synaptic connectivity between neurons across developmental stages and individual connectomes readily extracted from existing data.

      Strengths:

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSCAN's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      Weaknesses:

      NeuroSCAN provides an accessible and convenient platform. However, many of the characteristics of NeuroSCAN overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSCAN will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSCAN in generating future hypotheses.

      While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSCAN platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The past five years have seen the publication of both new (Witvliet et al., 2021) and newly analyzed (Cook et al., 2019; Moyle et al., 2021; Brittin et al., 2021) data for the C. elegans connectome. The increase in data availability for a single species allows researchers to examine variability due to both stochastic events and changes over development. The quantity of these data is huge. To help the community make these data more accessible, the authors present a new online tool that allows the examination of 3D models for C. elegans neurons in the central neuropil across development. In addition to visualizing the overall structure of the neuronal processes and locations of synapses, the NeuroSCAN tool also allows users to probe into the C-PHATE visualization results, which this group previously pioneered to describe similarities in neuron adjacency (Moyle et al., 2021).

      Strengths:

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      Weaknesses:

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data are readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides graphical tools for reconstructing the detailed anatomy of a nervous system from a series of sections imaged by electron microscopy. Contact between neuronal processes can direct outgrowth and is necessary for connectivity and, thus function. A bioinformatic approach is used to group neurons according to shared features (e.g., contact, synapses) in a hierarchy of "relatedness" that can be interrogated at each step. In this work, Koonze et al analyze vEM data sets for the C. elegans nerve ring (NR), a dense fascicle of processes from181 neurons. In a bioinformatic approach, the clustering algorithm Diffusion Condensation (DC) groups neurons according to similar cell biological features in iterations that remove chunks of differences in feature data with each step ultimately merging all NR neurons in one cluster. DC results are displayed with C-Phate a 3D visualization tool to produce a trajectory that can be interrogated for cell identities and other features at each iterative step. In previous work by these authors, this approach was utilized to identify subgroups of neuronal processes or "strata" in the NR that can be grouped by physical contact and connectivity. Here they expand their analysis to include a series of available vEM data sets across C. elegans larval development. This approach suggests that strata initially established during embryonic development are largely preserved in the adult. Importantly, exceptions involving stage-specific reorganization of neuronal placement in specific strata were also detected. A case study featured in the paper demonstrates the utility of this approach for visualizing the integration of newly generated neurons into the existing NR anatomy. Visualization tools used in this work are publicly available at NeuroSCAN.

      Strengths:

      A web-based app, NeuroSCAN, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development

      Weaknesses:

      In the opinion of this reviewer, only minor revisions are required.

    1. eLife Assessment

      This important study is of significant relevance to the fields of predictive processing, perception, and learning. The well-designed paradigm allows the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. Using a state-of-the-art multivariate EEG approach, the authors test the opposing process theory and find evidence in support of it, but some elements - especially the interactions across block - have only incomplete support at present. This could be strengthened via further analyses and justification.

    2. Reviewer #1 (Public review):

      Summary:

      In this lovely paper, McDermott and colleagues tackle an enduring puzzle in the cognitive neuroscience of perceptual prediction. Though many scientists agree that top-down predictions shape perception, previous studies have yielded incompatible results - with studies showing 'sharpened' representations of expected signals, and others showing a 'dampening' of predictable signals to relatively enhance surprising prediction errors. To deepen the paradox further, it seems like there are good reasons that we would want to see both influences on perception in different contexts.

      Here, the authors aim to test one possible resolution to this 'paradox' - the opposing process theory (OPT). This theory makes distinct predictions about how the time course of 'sharpening' and 'dampening' effects should unfold. The researchers present a clever twist on a leading-trailing perceptual prediction paradigm, using AI to generate a large dataset of test and training stimuli so that it is possible to form expectations about certain categories without repeating any particular stimuli. This provides a powerful way of distinguishing expectation effects from repetition effects - a perennial problem in this line of work.

      Using EEG decoding, the researchers find evidence to support the OPT. Namely, they find that neural encoding of expected events is superior in earlier time ranges (sharpening-like) followed by a relative advantage for unexpected events in later time ranges (dampening-like). On top of this, the authors also show that these two separate influences may emerge differently in different phases of learning - with superior decoding of surprising prediction errors being found more in early phases of the task, and enhanced decoding of predicted events being found in the later phases of the experiment.

      Strengths:

      As noted above, a major strength of this work lies in important experimental design choices. Alongside removing any possible influence of repetition suppression mechanisms in this task, the experiment also allows us to see how effects emerge in 'real-time' as agents learn to make predictions. This contrasts with many other studies in this area - where researchers 'over-train' expectations into observers to create the strongest possible effects or rely on prior knowledge that was likely to be crystallised outside the lab.

      Weaknesses:

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

    3. Reviewer #2 (Public review):

      Summary:

      There are two accounts in the literature that propose that expectations suppress the activity of neurons that are (a) not tuned to the expected stimulus to increase the signal-to-noise ratio for expected stimuli (sharpening model) or (b) tuned to the expected stimulus to highlight novel information (dampening model). One recent account, the opposing process theory, brings the two models together and suggests that both processes occur, but at different time points: initial sharpening is followed by later dampening of the neural activity of the expected stimulus. In this study, the authors aim to test the opposing process theory in a statistical learning task by applying multivariate EEG analyses and finding evidence for the opposing process theory based on the within-trial dynamics.

      Strengths:

      This study addresses a very timely research question about the underlying mechanisms of expectation suppression. The applied EEG decoding approach offers an elegant way to investigate the temporal characteristics of expectation effects. A major strength of the study lies in the experimental design that aims to control for repetition effects, one of the common confounds in prediction suppression studies. The reported results are novel in the field and have the potential to substantially improve our understanding of expectation suppression in visual perception.

      Weaknesses:

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, McDermott et al. investigate the neurocomputational mechanism underlying sensory prediction errors. They contrast two accounts: representational sharpening and dampening. Representational sharpening suggests that predictions increase the fidelity of the neural representations of expected inputs, while representational dampening suggests the opposite (decreased fidelity for expected stimuli). The authors performed decoding analyses on EEG data, showing that first expected stimuli could be better decoded (sharpening), followed by a reversal during later response windows where unexpected inputs could be better decoded (dampening). These results are interpreted in the context of opposing process theory (OPT), which suggests that such a reversal would support perception to be both veridical (i.e., initial sharpening to increase the accuracy of perception) and informative (i.e., later dampening to highlight surprising, but informative inputs).

      Strengths:

      The topic of the present study is of significant relevance to the field of predictive processing. The experimental paradigm used by McDermott et al. is well designed, allowing the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. The introduction of the manuscript provides a well-written summary of the main arguments for the two accounts of interest (sharpening and dampening), as well as OPT. Overall, the manuscript serves as a good overview of the current state of the field.

      Weaknesses:

      In my opinion, several details of the methods, results, and manuscript raise doubts about the quality and reliability of the reported findings. Key concerns are:

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10 of the reported contrasts have the exact same statistic of Tmax= 2.76. Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown (not contingent on post-hoc and seemingly arbitrary binning).

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      Impact:

      At present, I find the potential impact of the study by McDermott et al. difficult to assess, given the concerns mentioned above. Should the authors convincingly answer these concerns, the study could provide meaningful insights into the mechanisms underlying perceptual prediction. However, at present, I am not entirely convinced by the quality and reliability of the results and manuscript. Moreover, the difficulty in reconciling some of the present results with previous studies highlights the need for more convincing explanations of these discrepancies and a stronger discussion of the present results in the context of the literature.

    1. eLife Assessment

      The findings are considered valuable and have theoretical implications for the interdisciplinary field of value-based social decision-making. Support for the main claims is incomplete and these should be supported by further analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      Inferences rely heavily on the results of mixed effects models which may or may not be properly specified and are not supported by complementary analyses. Also, not all results hang together in a sensible way. For example, participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. Given that participants took longer to complete tasks when earning effort for others, it is conceivable that participants might have been working less hard for others versus themselves, and this may complicate the interpretation of results.

    3. Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences the processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that the amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      Weaknesses:

      Although the obtained results are highly plausible, I am concerned whether the reward positivity (RewP) and P3 were adequately measured. The RewP and P3 were defined as the average voltage values in the time intervals 300-400 ms and 300-440 ms after feedback onset, respectively. So they largely overlapped in time. Although the RewP measure was based on frontocentral electrodes (FC3, FCz, and FC4) and the P3 on posterior electrodes (P3, Pz, and P4), the scalp topographies in Figure 3 show that the RewP effects were larger at the posterior electrodes used for the P3 than at frontocentral electrodes. So there is a concern that the RewP and P3 were not independently measured. This type of problem can often be resolved using a spatiotemporal principal component analysis. My faith in the conclusions drawn would be further strengthened if the researchers extracted separate principal components for the RewP and P3 and performed their statistical analyses on the corresponding factor scores.

    4. Reviewer #3 (Public review):

      This study investigates how effort influences reward evaluation during prosocial behaviour using EEG and experimental tasks manipulating effort and rewards for self and others. Results reveal a dissociable effect: for self-benefitting effort, rewards are evaluated more positively as effort increases, while for other-benefitting effort, rewards are evaluated less positively with higher effort. This dissociation, driven by reward system activation and independent of performance, provides new insights into the neural mechanisms of effort and reward in prosocial contexts.

      This work makes a valuable contribution to the prosocial behaviour literature by addressing areas that previous research has largely overlooked. It highlights the paradoxical effect of effort on reward evaluation and opens new avenues for investigating the mechanisms underlying this phenomenon. The study employs well-established tasks with robust replication in the literature and innovatively incorporates ERPs to examine effort-based prosocial decision-making - an area insufficiently explored in prior work. Moreover, the analyses are rigorous and grounded in established methodologies, further enhancing the study's credibility. These elements collectively underscore the study's significance in advancing our understanding of effort-based decision-making.

      Despite these contributions, there are several gaps in the analysis that leave the conclusions incomplete and warrant further investigation. These issues can be summarized as follows:

      (1) Incomplete EEG Reporting: The methods indicate that EEG activity was recorded for both tasks; however, the manuscript reports EEG results only for the first task, omitting the decision-making task. If the authors claim a paradoxical effect of effort on self versus other rewards, as revealed by the RewP component, this should also be confirmed with results from the decision-making task. Omitting these findings weakens the overall argument.

      (2) Neural and Behavioural Integration: The neural results should be contrasted with behavioural data both within and between tasks. Specifically, the manuscript could examine whether neural responses predict performance within each task and whether neural and behavioural signals correlate across tasks. This integration would provide a more comprehensive understanding of the mechanisms at play.

      (3) Success Rate and Model Structure: The manuscript does not clearly report the success rate in the prosocial effort task. If success rates are low, risk aversion could confound the results. Additionally, it is unclear whether the models accounted for successful versus unsuccessful trials or whether success was included as a covariate. If this information is present, it needs to be explicitly clarified. The exclusion criteria for unsuccessful trials in both tasks should also be detailed. Moreover, the decision to exclude electrodes as independent variables in the models warrants an explanation.

      (4) Prosocial Decision Computational Modelling: The prosocial decision task largely replicates prior behavioural findings but misses the opportunity to directly test the hypotheses derived from neural data in the prosocial effort task. If the authors propose a paradoxical effect of effort on self-rewards and an inverse effect for prosocial effort, this could be formalised in a computational model. A model comparison could evaluate the proposed mechanism against alternative theories, incorporating the complex interplay of effort and reward for self and others. Furthermore, these parameters should be correlated with neural signals, adding a critical layer of evidence to the claims. As it is, the inclusion of the prosocial decision task seems irrelevant.

      (5) Contradiction Between Effort Perception and Neural Results: Participants reported effort as less effortful in the prosocial condition compared to the self condition, which seems contradictory to the neural findings and the authors' interpretation. If effort has a discounting effect on rewards for others, one might expect it to feel more effortful. How do the authors reconcile these results? Additionally, the relationship between behavioural data and neural responses should be examined to clarify these inconsistencies.

      Necessary Revisions to Manuscript: If the authors address the issues above, corresponding updates to the introduction and discussion sections could strengthen the narrative and align the manuscript with the additional analyses.

    1. Author response:

      (1) General Statements

      We thank all three reviewers for their constructive comments and suggestions. We also thank reviewers #2 and #3 for considering our work to be timely and of interest to the field, not only for basic researchers, but also for translational scientists and industry. We are now providing additional results to further support our hypothesis and hope that all reviewers will find that our manuscript is now ready for publication. 

      (2) Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): 

      The manuscript by Coquel et al. investigates the effects of BKC and IBC, two compounds found in Psoralea corylifolia in DNA replication and the response to DNA damage, and explores their potential use in cancer treatment. These compounds have been previously shown to affect different cellular pathways and the authors use transformed cancer cells of different origins and a non-transformed cell line to question if their combination is toxic in cancer versus non-cancer cells. They propose that BKC inhibits DNA polymerases while IBC targets CHK2. Their results show that both compounds do affect DNA replication, inducing replication stress and affecting double strand break repair. They also show that their combined use increases their toxicity in a synergistic manner. 

      However, there are some major conclusions that are still not very well supported by the data: first, the differential effect on cancer and non-transformed cells; second, the direct link of BKC to the inhibition of DNA polymerases; and third, it is unclear if CHK2 is the relevant target for IBC in this context. 

      Regarding these points the authors should address the following issues: 

      (1) Most of the experiments use BJ fibroblasts as a control cell line. In order to evaluate if these compounds are preferentially toxic for cancer cells, the use of more than one non-transformed cell line is necessary. In addition, BJ cells are fibroblasts while most of the cancer cell lines employed are of epithelial origin. The authors could use MCF10 and RPE cells (both of epithelial origin) as control cell lines to complement the results and better support this claim. 

      We have now monitored the effect of IBC and BKC on the proliferation of MCF-7, MCF-10A and RPE-1 cells using the WST-1 assay and obtained similar results as for BJ and MCF-7 cells. These results are now included in the revised manuscript as Fig. S1A and S1B.

      (2) In order to explore what are the targets of BKC and IBC Cellular Thermal Shift Assays (CETSA) could be used. Either by doing an unbiased mass spectrometry analysis of proteins stabilized by these compounds or by a direct analysis of candidate proteins by western blot (a similar approach has been used for IBC to show that it inhibits SIRT2 in Ren et al., 2024 Phytotherapy Res).

      We thank this Reviewer for suggesting the use of the CETSA assay. We have now performed  CETSA on MCF-7 cells and found that IBC stabilizes CHK2 but not CHK1, to the same extent as the commercial CHK2 inhibitor BML-277 used here as a positive control. These results are now shown in new Fig. 4G and 4H.

      (3) For BKC in vitro polymerase assays could be carried out to show the direct inhibition of the DNA polymerase delta, for instance. 

      We have used high-speed Xenopus egg extracts to replicate ssDNA in vitro (Fig. S2C). This assay differs from the in vitro replication assay using low-speed Xenopus egg extracts (Fig. 2H) in that it only monitors elongation by replicative DNA polymerases (Pol δ and ε) and not earlier steps such as origin licensing and activation. The combined use of both low-speed and highspeed extracts strongly supports the view that BKC inhibits replicative DNA polymerases. 

      To confirm this result, we have also used CETSA to monitor BKC binding to different subunits of DNA Polδ and Polε in MCF-7 cells and in Xenopus egg extracts (Fig. 3C-D Fig. S3). We found that BKC binds POLD1 and POLE, the catalytic subunits of Pol δ and ε respectively, but not the accessory subunit POLD3 nor PCNA. Together with our docking results and DNA fiber experiments, these data strongly support the view that BKC is a potent inhibitor of DNA Pol and Pol. 

      (4) In addition, the authors could analyze the integrity of replication forks by PCNA immunofluorescence analysis. The colocalization of PCNA and POLD or POLE subunits could also support the role of DNA polymerases as targets of BKC. 

      Our molecular docking results also show that BKC occupies the catalytic sites of DNA Pol δ and ε, which may not affect their subcellular localization and/or PCNA binding. Since our DNA replication assays, CETSA and DNA fiber analyses strongly support the view that BKC inhibits replicative DNA polymerases, we have not performed this additional experiment.

      (5) In the case of IBC and the inhibition of CHK2, the authors should check the effect of IBC on the phosphorylation of BRCA1 on S988. The changes in CHK2 phosphorylation in Figure 3B are not convincing. The experiment should be repeated and the average of at least three experiments needs to be quantified. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation on S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and Fig. S4H. Densitometric quantification of CHK2 phosphorylation on S516 from 3 biological replicates, along with statistical analysis, is now shown in Fig. S4G.

      (6) To prove that CHK2 is the relevant target for IBC the authors could test if ATM and CHK2 knockout cells are more resistant to this compound, since it would prevent the phosphorylation of CHK2. 

      We have performed siRNA transfection targeting CHK2. The transfected cells died after 72 hours in culture, so we have been unable to determine whether CHK2-KD cells have increased resistance to IBC.  

      In addition to these experiments, I would suggest some other major improvements in the manuscript: 

      (1) The concentration of both compounds should be provided in molar units throughout the paper.

      Thanks for pointing this out, we now use molar units throughout the paper.

      (2) The authors do not clearly indicate the concentration that is employed in the different experiments, making it difficult to assess the results. For instance, Figure 2 does not include the concentration in the legend or in the text. Time and concentration need to be clearly shown for each experiment. 

      The experimental conditions and inhibitor concentrations are now clearly indicated for each experiment.

      (3) Some experiments are only repeated once (fiber assays) or twice (cell cycle analysis by flow cytometry). These experiments need to be repeated 3 times and the proper statistical analysis performed (comparison of the medians). 

      Superplots with biological replicates for all DNA fiber assays are now displayed. The number of biological replicates is now indicated in the legends and appropriate statistical analyses are used.

      Other minor points or suggestions: 

      (1) Analyzing fork asymmetry would further support the direct effect of BKC on DNA polymerases. 

      The effect of BKC on fork asymmetry is now shown in Fig. 2F. 

      (2) A dose dependent analysis of BKC on the speed of DNA replication would also support this point. 

      Superplots of DNA fiber assays showing the effect of different concentrations of BKC on fork speed from three biological replicates are now included in Fig. 2E.

      (3) Page 7: BKC reduces fork speed ...two-fold. This sentence is not very clear, it would be better to say that speed is half of the control. 

      This sentence was changed to “BKC reduced fork speed by a factor of two relative to untreated cells”.

      (4) Figure 4G and S4D show contradictory results regarding the induction of Rad51 foci by IBC treatment. This needs to be clarified. 

      Figure 4G and S4D (now Fig. 5G and S5D) do not show contradictory results. In both cases, IBC treatment impaired the induction of RAD51 foci by IR or bleomycin.  

      (5) Page 12, Figure S5C is called for but it does not exist (probably meaning Figure S5B). 

      We apologize for this error, which has now been corrected.  

      Reviewer #1 (Significance): 

      The work by Coquel et al. aims at elucidating the use of BKC and IBC as a combined therapy to induce cell death in cancer cells by targeting DNA replication and CHK2. Both BKC and IBC have been previously shown to affect the proliferation of cancer cells. BKC has been shown to induce S phase arrest in an ATR dependent manner in MCF7 cells (Li et al., 2016 Front Pharm), while IBC induces cell death in MDA-MB-231 cells (Wu et al., 2022 Molecules). In this regard, the more interesting contribution of the manuscript is the potential identification of the targets of these compounds in cancer cells. The inhibition of CHK2 by IBC is quite compelling although it needs to be further proven. In contrast, the hypothesis that BKC inhibits DNA polymerases remains highly speculative. The results offer a limited advance in the knowledge of the mechanism of action of these two compounds. Focusing on the action of IBC on CHK2 would increase the impact of the results. In this sense a very recent report has been published showing that IBC inhibits SIRT2 (Ren et al., 2024 Phyto Res), showing that IBC can affect multiple enzymes and processes. This should be taken into account for a further analysis of its mechanism of action. 

      In addition to the identification of the targets of BKC and IBC, the authors also focus on their combination for cancer treatment. This is based on the idea that blocking the DSB repair and inducing replication stress at the same time is an efficient approach to induce cancer cell death. This is not a new concept, since the loss of ATM sensitizes cancer cells to the inhibition of the replication stress response and several combination therapies have been put forward with the idea of generating replication stress and preventing the subsequent repair of the double strand breaks induced in these cells. Thus, the novelty here is limited, especially considering that the effect of BKC on DNA replication has already been described. Further, since its mechanism of action is unclear, it is difficult to ascribe the observed synergy to the speculated hypothesis. A deeper analysis of IBC as a CHK2 inhibitor would be more interesting, and the potential combination with other chemotherapy agents such as replication stress inhibitors, HU or DNA damaging agents. Also, the lack of a good control of non-transformed cells also reduces the relevance of the work. 

      In its current state, the interest of the manuscript is limited. The mechanistical advance is not strong enough and is not completely supported by the data, and the use of these compounds as a combination therapy does not provide new insights in cancer treatment. In my opinion, focusing on the inhibition of CHK2 by IBC and its potential use would broaden the impact of the results beyond the mere analysis of the action of these compounds. 

      We thank this reviewer for his/her constructive and insightful comments. We have followed his/her advice and focused our analysis on the action of IBC on CHK2. Using CETSA, we confirmed that IBC binds CHK2 to the same extent as BML-277 inhibitor, but does not bind CHK1. We also show that IBC inhibits BRCA1 phosphorylation on S988 and CHK2 phosphorylation on S516. Together with the results presented in the initial version of the manuscript, these data support the view that CHK2 is a key IBC target. We have also applied CETSA to DNA polymerases and confirmed that BKC directly targets DNA Polδ and ε. Although it is unlikely that IBC and BKC will ever be used in combination therapies, the synergistic effect that we measured on cancer cells in vivo and in vitro indicates that IBC sensitizes cancer cells to endogenous replication stress and to exogenous sources of DNA damage, which could be used to replace BKC in combination therapies. For instance, our data indicate that IBC can be used in combination with drugs such as etoposide, doxorubicin or cyclophosphamide to potentiate their effect on drug-resistant lymphoma cell lines (DLBCL). As requested by this Reviewer, we have modified the discussion section to put more emphasis on IBC and CHK2 inhibitors and we hope that he/she will now find this revised version suitable for publication.

      Reviewer #2 (Evidence, reproducibility and clarity): 

      In the manuscript by Coquel et al., the authors report their findings on the effect of 2 natural compounds from Psoralea corylofolia plant extracts on cancer cells. They show that these compounds, bakuchiol (BKC) and isobavachalcone (IBC), inhibit proliferation of cancer cells and tumor development in xenografted mice, particularly when used in combination. They further show that BKC inhibited DNA polymerases and induced replication stress, and show evidence that IBC inhibits Chk2 kinase activity and downstream double-strand break repair. Based on their findings, the authors conclude that Chk2 inhibition and DNA replication inhibition represent a potential synergistic strategy to selecting target cancer cells. 

      Major: 

      (1) The data showing IBC is a Chk2 inhibitor is weak and more rigorous investigation is needed to establish this compound as a Chk2 inhibitor. 

      As indicate in our response to Reviewer #1, we have now analyzed the binding of IBC to CHK2 using the Cellular Thermal Shift Assay (CETSA) in MCF-7 cells. Our data clearly show that IBC binds to CHK2 but not CHK1. These results are now shown in Fig. 4G and 4H.

      For one, the authors mention they screened 43 cell cycle-related kinases in vitro, but only show data for 8 kinases in their kinase activity screens. Of these 8 kinases, Chk2 is the most strongly inhibited, but there are no data shown for the other 35 kinases. 

      Data for all the protein kinases tested in the in vitro assay are now presented in Fig. S4D and S4E.  

      Additionally, the purpose of the CHK2 mutants should be discussed in the text. 

      The CHK2(I157T) mutation is linked to an increased risk of breast and colorectal cancers. CHK2(R145W) is associated with Li-Fraumeni Syndrome. Both mutations do not affect the basal kinase activity of CHK2. This information is now indicated in the legend of Fig. S4D. 

      Secondly, the western blot in Fig 3B, appears to show a very modest effect of IBC on Chk2 autophosphorylation and not that different from the effect of IBC on Akt phosphorylation in Fig S3a. Yet, the authors claim that IBC inhibits Chk2 but not Akt. To strengthen these blots, a known Chk2 inhibitor, such as the one shown in Fig 4 (BML-277) should be included as a positive control for pChk2 similarly to what was shown for Akt with MK-2206. 

      We have now replaced the western blot in Fig. 3B (now Fig. 4B) with another biological replicate. Quantifications and statistical analyses of biological replicates are shown in Fig. S4G. Overall, we observed a 50% reduction of CHK2 auto-phosphorylation in MCF7 cells treated with IBC, and a 20% reduction in AKT phosphorylation (Fig. S4A). There was no additional reduction in AKT phosphorylation when cells were treated with IBC in combination with MK-2206, compared to cells treated with MK-2206 alone. We now include the CHK2 inhibitor BML-277 as a positive control alongside with IBC to monitor CHK2 and CHK1 auto-phosphorylation in Fig. 4B, S4G, 4D and S4I, respectively.

      Western blots showing a loss of phosphorylation of additional Chk2 targets is also needed. The manuscript mentions Brca1 S988 as a Chk2 substrate important for DSB repair. Showing the effect of IBC on this phosphorylation site would strengthen the conclusions. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation at S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and S4H. 

      (2) The authors claim that the combination of IBC and BKC inhibit cell growth in a synergistic manner and that the "effect is more pronounce on cancer cells than on non-cancer cells." However, only 1 non-malignant cell line was used, and it was a fibroblast line. To make this claim, the authors need to show the effect in additional non-malignant cells, preferably with epithelial cell types. 

      We have now monitored cell proliferation using the WST-1 assay in two additional non-malignant cell lines, namely MCF-10A and RPE-1 cells. Cells were treated with IBC/BKC and their growth was compared to that of MCF-7 cells. These experiments yielded similar results to those obtained with BJ fibroblasts. These new data are now included in the revised version as Fig. S1A and S1B. 

      Minor: 

      (1) Densitometry data for all western blots should be shown with mean+/- stdev of independent western blots. 

      Densitometry data for all western blots with biological replicates are now shown in supplementary figures.

      (2) In Figure 1B the statistical test used to analyze cell number was not stated. 

      The statistical test is now indicated in Fig. 1B.

      (3) In Figure 2A, the DAPI image for BKC is the merged image and should be replaced with just DAPI. 

      This error has now been corrected.

      (4) In Figure 2B, the y-axis label says "yH2AX foci (MFI)". MFI and foci are not the same thing, and for yH2AX, the signal is often not focal. MFI of yH2AX is an appropriate measurement for replication stress, it's just not appropriate to equate MFI to foci. 

      We apologize for this labeling error, which has now been corrected.

      (5) For the 53BP1 MFI and Rad51 MFI shown in Fig 4 and Fig S4, it is more appropriate to show the number of foci/cell as these are better indicators of breaks and repair sites. MFI is influenced by expression levels of the proteins and not necessarily the break/repair. 

      The numbers of 53BP1 and RAD51 foci are now shown.

      (6) The data in Figures 5B and 5C are very difficult to read. Perhaps color-coat the lines/symbols. 

      We have now colored the graph to increase its readability. 

      Reviewer #2 (Significance): 

      The findings reported in this manuscript are timely, of interest to the field, and are mostly wellsupported by the experimental data. However, there are a few concerns that need to be addressed. 

      We are grateful to Reviewer #2 for his positive assessment of our manuscript. We hope that we have adequately addressed all of his/her specific concerns and that he/she will agree with the need to put more emphasis on IBC and CHK2 inhibition as requested by Reviewer #1.

      Reviewer #3 (Evidence, reproducibility and clarity): 

      The manuscript: "Synergistic effect of inhibiting CHK2 and DNA replication on cancer cell growth" successfully demonstrates that the compounds BKC and IBC found in Psoralea corylifolia act synergistically to inhibit cancer cell proliferation, using a wide range of well-chosen methodologies. Moreover, the authors characterized the mechanisms of action of both drugs, which result in inhibition of cell proliferation. The use of multiple cell lines and the mice models makes the study robust and complete. The manuscript presents a well written study that offers new insights and contributions to the field. 

      A few suggestions to improve the study: 

      (1) Given that both compounds BKC and IBC have already been previously described in the literature, it would be helpful for the reader to have them described better at the beginning of the study. 

      Thanks for pointing this out. We have now better described BKC and IBC at the beginning of the results section, as well as in the discussion. We agree that this could be helpful to readers.

      (2) Addition of western blot quantifications over the number of experimental repeats is important specifically for Fig. 2C and Fig. 3C where partial effect of treatment on a signal level is reported. 

      The densitometry analysis of data shown in Fig. 2C and biological replicates are now shown in Fig. S2B. Quantification for Fig. 3C (now Fig. 4D) is shown in Fig. S4I.

      (3) The quantification of mean intensity for 53BP1 and RAD51 foci should be exchanged with the quantification of number of foci per cell. While the quantification of gH2AX signal intensity is a correct representation of induction of this signal upon damage, foci formed by protein recruitment to DNA damage sites should be quantified by counting the number of foci, rather than signal in the whole cell/nucleus. These proteins exist before damage and are re-located in response to the damage. 

      Quantification of 53BP1 and RAD51 foci is now expressed as the number of foci per cell. 

      (4) Materials & Methods section is missing the methods for the experiment described in Fig. 1B. In summary, after addressing our few concerns, we believe the manuscript should be accepted for publication. 

      The WST-1 assay used for cell number quantification is included in “Reagents” in Material & Methods section.

      Reviewer #3 (Significance):

      The manuscript presents a well written study that offers new insights and contributions to the field. Although the inhibitors described have been known in science, the authors present convincingly their mode of action, which is either better characterized (for BKC) or inhibiting a different than previously suggested enzyme (for IBC). Authors also nicely pinpoint and explain the narrow window of concentrations when these two compounds act synergistically rather than additively. The analyses in multiple cell lines, mouse models and in combination with other cancer treatments, makes this study of interest not only for fundamental researchers but also for translational scientists and industry.

      My field of expertise: DNA replication and replication stress across model systems. 

      We are grateful to Reviewer #3 for his/her very positive assessment of our work and we hope that he/she will find this revised version suitable for publication.

    2. eLife Assessment

      This study presents important findings on the activity of two compounds, BKC and IBC, isolated from Psoralea corylifolia, which act synergistically to inhibit cancer cell proliferation. Using a spectrum of methods, the authors characterized the mechanisms of action of both drugs, providing convincing evidence that BKC targets DNA polymerases and IBC selectively inhibits CHK2. The study opens the possibility of improving the effectiveness of the combination of BKC and other damaging agents with IBC in cancer treatment.

      [Editors' note: this paper was reviewed by Review Commons.]

    3. Reviewer #1 (Public review):

      The manuscript by Coquel et al. investigates the effects of BKC and IBC, two compounds found in Psoralea corylifolia in DNA replication and the response to DNA damage, and explores their potential use in cancer treatment. These compounds have been previously shown to affect different cellular pathways and the authors use transformed cancer cells of different origins and a non-transformed cell line to question if their combination is toxic in cancer versus non-cancer cells. They propose that BKC inhibits DNA polymerases while IBC targets CHK2. Their results show that both compounds do affect DNA replication, inducing replication stress and affecting double strand break repair. They also show that their combined use increases their toxicity in a synergistic manner.

      Comments on current version:

      The authors have addressed the main questions raised in the original manuscript. The new data provide stronger evidence supporting the inhibition of DNA polymerases by BKC and the effect of IBC on CHK2. In addition, the new data provides information about the potential mechanism of action of IBC in cells and xenograft models. Together, the revised manuscript has notably increased the relevance and impact of the results with stronger conclusions and better controlled experiments.

    4. Reviewer #2 (Public review):

      Summary:

      The manuscript: "Synergistic effect of inhibiting CHK2 and DNA replication on cancer cell growth" successfully demonstrates that the compounds BKC and IBC found in Psoralea corylifolia act synergistically to inhibit cancer cell proliferation, using a wide range of well-chosen methodologies. Moreover, the authors characterized the mechanisms of action of both drugs, which result in inhibition of cell proliferation. The use of multiple cell lines and the mice models makes the study robust and complete.

      Significance:

      The manuscript presents a well written study that offers new insights and contributions to the field. Although the inhibitors described have been known in science, the authors present convincingly their mode of action, which is either better characterized (for BKC) or inhibiting a different than previously suggested enzyme (for IBC). Authors also nicely pinpoint and explain the narrow window of concentrations when these two compounds act synergistically rather than additively. The analyses in multiple cell lines, mouse models and in combination with other cancer treatments, make this study of interest not only for fundamental researchers but also for translational scientists and industry.

    1. eLife Assessment

      This valuable study uses AlphaFold2 to guide the structural modelling of different states of the human voltage-gated potassium channel KV11.1, a key pharmacological drug target. Follow-up molecular dynamics and drug-docking simulations, combined with experimental comparisons, offer convincing evidence supporting the models, showing that drugs bind more effectively to the inactivated state. The work shows potential for improving drug potency predictions in ion channel pharmacology, though its applicability to other systems remains uncertain.

    2. Reviewer #1 (Public review):

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed, and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed, and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations, and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

    3. Reviewer #2 (Public review):

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiments support the plausibility of their models.

      Strengths:

      This is thorough work studied from many different angles. It provides a self-consistent picture of how conformational changes in hERG may affect its function and binding to different targets.

      Weaknesses:

      Though this work claims the methodologies can be generalized to other systems, it is not obvious how. Many modeling choices seem arbitrary and also seem to have required extensive expert knowledge of the system. This limits the applicability of the modeling strategy.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use Alphafold2, Rosetta, and Molecular Dynamics to model structures of the hERG K channel in open, inactive, and closed states. Experimental CryoEM data for open hERG (Wang and Mackinnon 2017), and closed EAG (Mandala and Mackinnon, 2002) were used as the main templates for channel models presented here. Given the importance of hERG as a safety pharmacology target, the identification of a robust simulation method to assess drug block is an important addition to the field.

      Strengths

      The key findings here are new inactivated and closed hERG channel conformations and hERG channel conformations with drugs docked in the inner vestibule below the selectivity filter. Amino acid pathways and interaction networks for different states are also presented.

      The inactive state and drug block models are carefully correlated with experimental data for the inactivated state of hERG (Lau et al, 2024) and with experimental free energy data for drug binding and have overall good agreement.

      It is remarkable that using cytoplasmic domain structures of hERG as a starting point revealed inactivation state structures in the hERG selectivity filter in Figures 2,3.

      Weaknesses

      Figure 6, if each data point is for a different drug, then perhaps identify each point.

      The PAS domain was not included in the models as stated in Methods page 14 but the PAS does appear in some of the templates used as starting points for models in Figure 1 a,b,c. Perhaps mentioning that the PAS was not included in some (all?) of the final models should be moved into the main text and discussed.

      The drug block of 1b channels (which do not contain PAS) has been reported to be slightly different than that for 1a channels (which contain PAS) and for 1a/1b channels (see London et al., 1997; https://doi.org/10.1161/01.RES.81.5.870 and Abi-Gerges et. al., 2011; DOI: 10.1111/j.1476-5381.2011.01378.x) and this should be discussed since the models presented here appear to be performed in the absence of the PAS.

      It also appears that the N-linker region (between PAS and the S1) and distal C region of hERG (post CNBHD-COOH) are not included in models, please state this if correct, and discuss.

    5. Author response:

      Reviewer #1: 

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed, and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed, and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations, and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      We appreciate the reviewer’s recognition of our integration of state-of-the-art computational methods, including AlphaFold2, Rosetta, MD simulations, and Markov modeling. We are pleased that the reviewer found our approach to investigating the structure-function relationship of the hERG channel insightful.

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We acknowledge the concern regarding the selection criteria for the inactivated state models. In the revised manuscript version, we plan to broaden our selection approach and explicitly include conformations from different clusters beyond those highlighted in the initial submission (e.g., from cluster 3). We will also incorporate structural metrics that do not solely depend on the known channel inactivation hallmarks or reply on the pLDDT scores to further justify our chosen representative inactivated state models.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We agree that a more rigorous control for our drug-binding predictions is desirable. To address this, we will include molecular docking simulations and associated drug binding affinity estimations for more hERG channel models, including alternate conformations from the initial clustering that were not chosen as the final models. This will allow us to test whether our inactivated state structure from cluster 2 indeed outperforms or differs significantly from other possible inactivated hERG channel conformations in reproducing experimental drug potencies.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We will incorporate statistical analyses and measures of uncertainty for key comparisons. In Figures 3 and S1-S4 the consensus structural hERG channel models for open, inactivated and closed states are being compared, i.e. one representative model for each state. We believe this is a valid comparison, and the statistical analysis of the observed trends based on those models (e.g., in the bar plot of Figure 3d) alone might not be possible. However, we agree with the reviewer that instead of relying solely on those initial static models, we will also draw on the ensemble of states sampled during the MD simulations to quantify structural differences between different putative hERG channel states. Specifically, we will present ensemble-averaged measurements and highlight how these distributions differ significantly between states.

      Reviewer #2:

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiments support the plausibility of their models.

      Strengths:

      This is thorough work studied from many different angles. It provides a self-consistent picture of how conformational changes in hERG may affect its function and binding to different targets.

      We are grateful for the reviewer’s recognition of the thoroughness and multi-faceted nature of our study.

      Weaknesses:

      Though this work claims the methodologies can be generalized to other systems, it is not obvious how. Many modeling choices seem arbitrary and also seem to have required extensive expert knowledge of the system. This limits the applicability of the modeling strategy.

      We appreciate the reviewer’s comment on the generalizability of our approach. In the revision, we will more explicitly discuss the rationale behind the modeling choices and the extent to which they reflect system-specific knowledge. We will clarify how the strategies we developed (e.g., iterative refinement with AlphaFold2 and Rosetta, followed by MD simulation validation) can be adapted to other ion channels or related proteins. We will also outline a more generalizable workflow, specifying which steps require system-specific information and which steps are broadly applicable.

      Reviewer #3:

      Summary:

      The authors use Alphafold2, Rosetta, and Molecular Dynamics to model structures of the hERG K channel in open, inactive, and closed states. Experimental CryoEM data for open hERG (Wang and Mackinnon 2017), and closed EAG (Mandala and Mackinnon, 2002) were used as the main templates for channel models presented here. Given the importance of hERG as a safety pharmacology target, the identification of a robust simulation method to assess drug block is an important addition to the field.

      Strengths

      The key findings here are new inactivated and closed hERG channel conformations and hERG channel conformations with drugs docked in the inner vestibule below the selectivity filter. Amino acid pathways and interaction networks for different states are also presented.

      The inactive state and drug block models are carefully correlated with experimental data for the inactivated state of hERG (Lau et al, 2024) and with experimental free energy data for drug binding and have overall good agreement.

      It is remarkable that using cytoplasmic domain structures of hERG as a starting point revealed inactivation state structures in the hERG selectivity filter in Figures 2,3.

      We thank the reviewer for highlighting the novelty and importance of our work, particularly regarding the identification of new inactivated and closed hERG channel conformations and the modeling of drug block. We are also pleased that the reviewer found the correlation with experimental data to be strong and the structural insights to be valuable.

      Weaknesses

      Figure 6, if each data point is for a different drug, then perhaps identify each point.

      Thank you so much for this suggestion. Please note that Table 3 contains drug-specific data plotted in Figure 6 including drug names. We will provide a reference to Table 3 in the revised Figure 6 caption. We will also revise Figure 6 (and any similar figures) to clearly identify each data point with the corresponding drug and/or include a corresponding key in the Figure legend. This will make it easier to correlate each data point’s binding prediction with the experimental datasets.

      The PAS domain was not included in the models as stated in Methods page 14 but the PAS does appear in some of the templates used as starting points for models in Figure 1 a,b,c. Perhaps mentioning that the PAS was not included in some (all?) of the final models should be moved into the main text and discussed.

      The drug block of 1b channels (which do not contain PAS) has been reported to be slightly different than that for 1a channels (which contain PAS) and for 1a/1b channels (see London et al., 1997; https://doi.org/10.1161/01.RES.81.5.870 and Abi-Gerges et. al., 2011; DOI: 10.1111/j.1476-5381.2011.01378.x) and this should be discussed since the models presented here appear to be performed in the absence of the PAS.

      It also appears that the N-linker region (between PAS and the S1) and distal C region of hERG (post CNBHD-COOH) are not included in models, please state this if correct, and discuss.

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on hERG channel drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. Similarly, the N-linker and the distal C-region were also omitted from the final models. These omissions were primarily due to hardware constraints used for AlphaFold structural modeling, as including these additional protein regions would exceed the memory capacity of graphical processing unit (GPU) cards on our available intramural, external and cloud high-performance computing resources, leading to failures during the protein structure prediction step.

      The PAS domain of hERG 1a isoform, even if not serving as a direct drug-binding site, can influence the gating kinetics of hERG channels as the reviewer pointed out. By altering the probability and duration with which those ion channels occupy specific conformational states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts channel gating so that more channels enter (and remain in) the inactivated state, drugs with a higher affinity for that state would appear to bind more potently, as observed in electrophysiological experiments. It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the ion channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG 1a channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We will incorporate a discussion of these points into the main text, acknowledging the limitations of our current models, citing the references provided by the reviewer, and highlighting the need for future studies to explore these protein regions in greater detail.

    1. eLife Assessment

      This is an important technical method paper that details the development and quality assessment of a 3D MERFISH method to enable spatial transcriptomics of thick tissues, representing a major step forward in the technical capacity of the MERFISH. The evidence presented is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Fang et al. reports a 3D MERFISH method that enable spatial transcriptomics for tissues up to 200um in thickness. MERFISH as well other spatial transcriptomics technologies have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spaital transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have major impact on future spatial transcriptomics studies to benefit diverse biomedical fields.

      Strengths:

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed and the results are solid and compelling.

      Weaknesses:

      Thorough performance comparison with other existing technologies can be done in the future.

      Comments on revisions:

      The authors have sufficiently addressed the previous comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The study by Fang et al. reports a 3D MERFISH method that enables spatial transcriptomics for tissues up to 200um in thickness. MERFISH, as well as other spatial transcriptomics technologies, have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spatial transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have a major impact on future spatial transcriptomics studies to benefit diverse biomedical fields. 

      Strengths: 

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed, and the results are solid and compelling. 

      Response: We thank the reviewer for the positive comments on our manuscript.  

      Weaknesses: 

      The biological application examples were limited to cell type/subtype classification in two brain regions. Additional examples of how the data could be used to address important biological questions will enhance the impact of the study. 

      We appreciate the reviewer's suggestion that demonstrating the broader applications of our thick-tissue 3D MERFISH method to address important biological questions would enhance the impact of our study. In line with the reviewer's feedback, we have included discussions on how this method could be applied to address various biological questions in the summary (last) paragraph of our manuscript. These discussions highlight the versatility and utility of our approach in studying diverse biological processes beyond cell type classification. 

      However, the goal of this work is to develop a method and establish its validity. While we are interested in applying it to addressing important biological questions in the future, we consider these applications beyond the scope of this work. 

      Reviewer #2 (Public Review): 

      Summary: 

      In their preprint, Fang et al present data on extending a spatial transcriptomics method, MERFISH, to 3D using a spinning disc confocal. MERFISH is a well-established method, first published by Zhuang's lab in 2015 with multiple follow-up papers. In the last few years, MERFISH has been used by multiple groups working on spatial transcriptomics, including approximately 12 million cell maps measured in the mouse brain atlas project. Variants of MERFISH were used to map epigenetic information complementary to gene expression and RNA abundance. However, MERFISH was always limited to thin ~10um sections to this date.

      The key contribution of this work by Fang et al. was to perform the optimization required to get MERFISH working in thick (100-200um) tissue sections. 

      Major strengths and weaknesses: 

      Overall the paper presents a technical milestone, the ability to perform highly multiplexed RNA measurements in 3D using MERFISH protocol. This is not the first spatial transcriptomics done in thick sections. Wang et al. 2018 - StarMAP used thick sections (150 um), and recently, Wang 2021 (EASI-FISH, not cited) performed serial HCR FISH on 300um sections. Data so far suggest that MERFISH has better sensitivity than in situ sequencing approaches (StarMAP) and has built-in multiplexing that EASI-FISH lacks. Therefore, while there is an innovation in the current work, i.e., it is a technically challenging task, the novelty, and overall contribution are modest compared to recently published work.  

      The authors could improve the writing and the manuscript text that places their work in the right context of other spatial transcriptomics work. Out of the 25 citations, 12 are for previous MERFISH work by Zhuang's lab, and only one manuscript used a spatial transcriptomics approach that is not MERFISH. Furthermore, even this paper (Wang et al, 2018) is only discussed in the context of neuroanatomy findings. The fact that Wang et al. were the first to measure thick sections is not mentioned in the manuscript. The work by Wang et al. 2021 (EASI-FISH) is not cited at all, as well as the many other multiplexed FISH papers published in recent years that are very relevant. For example, a key difference between seqFISH+ and MERFISH was the fact that only seqFISH+ used a confocal microscope, and MERFISH has always been relying on epi. As this is the first MERFISH publication to use confocal, I expect citations to previous work in seqFISH and better discussions about differences. 

      We thank the reviewer for recognizing our work as a technical milestone. Since the aim of this work is to build upon the strengths of MERFISH and address some of its limitations, we primarily cited previous MERFISH papers to clarify the specific improvements made in this work. Given the rapid growth of the spatial omics field, it has become impractical to comprehensively cite all method development papers. Instead, we cited a 2021 review article in the first sentence of the originally submitted manuscript and limited all discussions afterwards to MERFISH. In light of this reviewer’s suggestion to more broadly cite spatial transcriptomics work, we added two additional review articles on spatial omics. Spatial omics methods primarily include two categories: 1) imaging-based methods and 2) next-generation-sequencing based methods. The 2021 review article [Zhuang, Nat Methods 18,18–22 (2021)) included in the originally submitted manuscript is focused on imaging-based methods. The additional 2021 review article [Larsson et al., Nat Methods 18, 15–18 (2021)] that we now included in the revised manuscript is focused on next-generation-sequencing based methods. We also added a more recent review article published in 2023 [Bressan et al., Science 381:eabq4964 (2023)], which covers both categories of methods and include more recent technology developments. All three review articles are now cited in parallel in the first introductory paragraph of the manuscript.

      Although we presented our work as an advance in MERFISH specifically, we do consider the reviewer’s suggestion of citing the 2018 STARmap paper [Wang et al., Science 361, eaat5961 (2018)] in the introduction part of our manuscript reasonable. This STARmap paper was already cited in the results part of our originally submitted manuscript, and we have now described this work in the introduction part of our revised manuscript (third paragraph), as this paper was the first to demonstrate 3D in situ sequencing in thick tissues. In addition, we thank the reviewer for bringing to our attention the EASI-FISH paper [Wang et al, Cell 184, 6361-6377 (2021)], which reported a method for thick-tissue FISH imaging and demonstrated imaging of 24 genes using multiple rounds of multi-color FISH imaging. We also recently became aware of a paper reporting 3D imaging of thick samples using PHYTOMap [Nobori et al, Nature Plants 9, 10261033 (2023)]. This paper, published a few days after we submitted our manuscript to eLife, demonstrated imaging of 28 genes in thick plant samples using multiple rounds of multicolor FISH and the probe targeting and amplification methods previously developed for in situ sequencing. We also included these two papers in the introduction section of our revised manuscript (third paragraph). In addition, we also expanded the discussion paragraph (last paragraph) of the manuscript to discuss these thick tissue imaging methods in more details, and in the same paragraph, we also included discussions on two recent bioRxiv preprints in thicktissue transcriptomic imaging [Gandin et al., bioRxiv, doi:10.1101/2024.05.17.594641 (2024); Sui et al., bioRxiv, doi:10.1101/2024.08.05.606553 (2024)]

      However, we do not consider our use of confocal imaging in this work an advance in MERFISH because confocal microscopy, like epi-fluorescence imaging, is a commonly used approach that could be applied to MERFISH of thin tissues directly without any alteration of the protocol. Confocal imaging has been broadly used for both DNA and RNA FISH before any genomescale imaging was reported. Confocal and epi-imaging geometries have their distinct advantages, and which of these imaging geometries to use is the researcher’s choice depending on instrument availability and experimental needs. Thus, we do not find it necessary to cite specific papers just for using confocal imaging in spatial transcriptomic profiling. Our real advance related to confocal imaging is the use of machine-learning to increase the imaging speed. Without this improvement, 3D imaging of thick tissue using confocal would take a long time and likely degrade image quality due to photobleaching of out-of-focus fluorophores before they are imaged. We thus cited several papers that used deep learning to improve imaging quality and/or speed [(Laine et al., International Journal of Biochemistry & Cell Biology 140:106077 (2021); Ouyang et al., Nat Biotechnol 36:460–468 (2018); Weigert et al., Nat Methods 15:1090–1097 (2018)] in our original submission. Our unique contribution is the combination of machine learning with confocal imaging for 3D multiplexed FISH imaging of thick tissue samples, which had not been demonstrated previously.

      To get MERFISH working in 3D, the authors solved a few technical problems. To address reduced signal-to-noise due to thick samples, Fang et al. used non-linear filtering (i.e., deep learning) to enhance the spots before detection. To improve registrations, the authors identified an issue specific to their Z-Piezo that could be improved and replaced with a better model. Finally, the author used water immersion objectives to mitigate optical aberrations. All these optimization steps are reasonable and make sense. In some cases, I can see the general appeal (another demonstration of deep learning to reduce exposure time). Still, in other cases, the issue is not necessarily general enough (i.e., a different model of Piezo Z stage) to be of interest to a broad readership. There were a few additional optimization steps, i.e., testing four concentrations of readout and encoder probes. So while the preprint describes a technical milestone, achieving this milestone was done with overall modest innovation. 

      We appreciate the reviewer's recognition of the technical challenges we have overcome in developing this 3D thick-tissue MERFISH method. To achieve high-quality thick- tissue MERFISH imaging, we had to overcome multiple different challenges. We agree with the reviewer that the solutions to some of the above challenges are intellectually more impressive than the remaining ones that required relatively more mundane efforts. However, all of these are needed to achieve the overall goal, a goal that is considered a milestone by the reviewer.  We believe that the impact of a method should be evaluated based on its capabilities, potential applications, and its adaptability for broader adoption. In this regard, we anticipate that our reported method will be valuable and impactful contribution to the field of spatial biology.

      Data and code sharing - the only link in the preprint related to data sharing sends readers to a deleted Dropbox folder. Similarly, the GitHub link is a 404 error. Both are unacceptable. The author should do a better job sharing their raw and processed data. Furthermore, the software shared should not be just the MERlin package used to analyze but the specific code used in that package.  

      We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process. We have now made all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin v2.2.7 package itself, we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) It will be good to expand the application section to demonstrate the utility of 3D MERFISH to address diverse types of biological questions for the two brain regions examined. At present, it only examined the localization of various cell clusters in the tissues. Can it be used to examine both short and long-range interactions, for example? 

      We appreciate the reviewer's feedback and agree that demonstrating the broader applications of our 3D thick-tissue MERFISH imaging method in addressing diverse biological questions would enhance the impact of our study.  

      In line with the reviewer’s comments, one of the analyses we performed in the manuscript was examining short-range interactions based on soma contact between adjacent neurons in the two brain regions studied (see third-to-last and second-to-last paragraphs of the Main text). This analysis provided insights into the spatial organization of inhibitory neurons and potential interactions between the same type of interneurons in these brain regions. 

      Although long-range interactions, for example synaptic interactions between neurons, would be of great interest, our current 3D MERFISH measurements does not allow such interactions to be determined. Future research to enable measurements of synaptic interactions between molecularly defined neuronal subtypes would be interesting, but we consider this to be out of the scope of the current study.

      (2) For the nearest neighbor distance analysis in Figure 3, the method seems to be missing. Please add details about this analysis to allow better understanding. It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes. Please explain. 

      We apologize for the missing the nearest neighbor distance analysis in the Materials and Methods section.  We have added the detailed description of this analysis to the Materials and Methods section of the revised manuscript (last subsection of Materials and Methods).

      Regarding the comment “It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes”, this is not necessarily counter-intuitive given how we defined nearest-neighbor distances between the same subtype of neurons and nearestneighbor distances between different subtypes of neurons. Here is how we performed this analysis for interneurons. First, we determined the nearest-neighbor neurons for each interneuron and classified it as either having another interneuron of the same type as the nearest neighbor or having a different type of interneuron or an excitatory neuron as the nearest neighbor. We then determine the distributions for the distances between these two types of nearest neighbors and compared these distributions. When a neuronal subtype for a tight spatial cluster, such as the type-A cluster shown in the schematic below, the nearest-neighbor distances between nearest neighbor A-A pairs are indeed small. However, the distance between a type-A neuron and a different type of neurons (for example, type-B) is not necessarily bigger than those between two type-A neurons, if the nearest neighbor cell for this type-A neuron is a type-B neuron. These nearest-neighbor A-B pairs are likely formed between type-A neurons at the edge of the cluster with type-B neurons near the edge of the type-A cluster. If the distance of an A-B pair is not comparable to those of nearest-neighbor A-A pairs, it is unlikely a nearestneighbor pair by our definition as described above.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The scholarship in this work is lacking. All of the non-MERFISH parts of the field of spatial transcriptomics are ignored. The work needs to be discussed in the context of the literature. 

      We thank the reviewer for this suggestion and have included discussions of other spatial omics work, and other thick-tissue multiplexed imaging work in the Introduction and discussion section of the manuscript. Please see details in our response to the Public Review  portion of this reviewer’s comments.  

      (2) The data/code sharing links are broken and need to be fixed. 

      Response: We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process We have now placed all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin (MERFISH decoding package itself), we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922) to ensure that the readers can fully reproduce the results presented in our manuscript.

    1. eLife Assessment

      This paper advances an important new concept in psoriasis pathogenesis and implicates Sema4a as a homeostatic regulator that is highly epithelial-specific. The findings are convincing and lend support for the biology described here as a mechanism with therapeutic implications.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. Like for the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      In the revised manuscript, the authors expand on the potential relevance to psoriasis by demonstrating similar findings in an IL-23-diriven model of skin inflammation, which is an orthogonal model of psoriasis to their original IMQ model. They also show that in addition to reversing steady state differences in skin thickness between Sema4a KO mice and WT mice, rapamycin improves metrics of disease in the IMQ model of psoriasis. These additional studies further bolster their conclusions that Sema4a may play a protective role in by preventing over-activation of mTOR in the skin in psoriasis.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      The new experiments provide additional data to support the conclusions through an orthogonal model of psoriasis and demonstrating rapamycin-induced reversal of changes in the IMQ disease model.

      Weaknesses:

      While the main weakness of these studies, lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only), remains, generating these mice and performing the necessary experiments is beyond the scope of completing these particular studies. Similarly, it is understandable that additional bone marrow chimeras would be costly and labor intensive without adding much more in the absence of tissue-specific knockouts.

    3. Reviewer #2 (Public review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism of decreased Sema4A expression in psoriasis is not clear, although this does not affect the strength of this research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady-state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. As with the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to the development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      Weaknesses:

      A weakness of the study is the lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only). The authors did use bone marrow chimeras, but only in one experiment. This work implies that psoriasis may represent a Sema4a-deficient state in the epidermal cells, while the same might not be true for immune cells. Indeed, in their analysis of non-lesional psoriasis skin, Sema4a was not significantly decreased compared to control skin, possibly due to compensatory increased Sema4a from other cell types. Unbiased RNA-seq of Sema4a KO mouse skin for comparison to non-lesional skin might identify other similarities besides mTOR signaling. Indeed, targeting mTOR with rapamycin reveres some of the skin changes in Sema4a KO mice, but not skin thickness, so other pathways impacted by Sema4a may be better targets if they could be identified. Utilizing WT→KO chimeras in addition to global KO mice in the experiments in Figures 6-8 would more strongly implicate the separate role of Sema4a in skin vs immune cell populations and might more closely mimic non-lesional psoriasis skin.

      We sincerely appreciate your summary and for pointing out the strengths and weaknesses of our study. Although we were unfortunately unable to perform all these experiments due to limitations in our resources, we fully agree with the importance of studying tissue-specific Sema4A KO mice. As an alternative, we compared the IL-17A-producing potential of skin T cells between WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry. The results were comparable between the two groups. Additionally, we performed RNA-seq on the epidermis of WT and Sema4A KO mice. While we did not find similarities between Sema4A KO skin and non-lesional psoriasis except for S100a8 expression, we will further try to seek for the mechanisms how Sema4A KO skin mimics non-lesional psoriasis skin as a future project.

      Although targeting mTOR with rapamycin did not reverse the epidermal thickness in Sema4A KO mice, rapamycin was effective in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4A KO mice. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes, which would be of interest to clinicians. Thank you once again for your valuable insights.

      Reviewer #2 (Public Review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism by which decreased Sema4A expression may exacerbate psoriasis is unclear as yet.

      We greatly appreciate your summary and thoughtful feedback on the strengths and weaknesses of our study. In response, we have included the results of additional experiments on IL-23-mediated psoriasis-like dermatitis, which showed that epidermal thickness was significantly greater in KO mice compared to WT mice. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells within the CD3 fraction of the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis. Furthermore, we examined STAT3 expression in the epidermis of WT and Sema4A KO mice using Western blot analysis, and the results were comparable between the two groups. However, the mechanism by which decreased Sema4A expression may exacerbate psoriasis remains unclear. We have added some explanations and presumptions to the limitations section. Thank you once again for your valuable insights.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1C

      What statistics were used? The supplemental notes adjusted the P value, what correction for multiple comparisons was utilized? Could the authors instead show logFC for the DEGs between Ctl and L in each cluster? This might be best demonstrated with a volcano plot, highlighting SEMA4A, and other genes known to be DE in psoriasis.

      We apologize for not including the detailed analysis methods in the original manuscript submission. We analyzed the scRNA-seq data using Cellxgene VIP with Welch’s t-test. Multiple comparisons were performed using the Benjamini-Hochberg procedure, setting the false discovery rate (FDR) at 0.05. These details are now explained in the MATERIALS AND METHODS section of the resubmitted manuscript. We also added a log2FC-log10 p-value graph for the DEGs in keratinocytes between Ctl and L to Figure 1-figure supplement 1D. The log2FC values in keratinocytes, dendritic cells, and macrophages were -0.07, 0.00, and -0.05, respectively. Although the log2FC is low in keratinocytes, the adjusted p-value (padj) for Sema4A is 2.83×10-39, indicating a statistically significant difference.

      Page 8 Line 111 in the resubmitted manuscript:

      “The adjusted p-value (padj) for SEMA4A in keratinocytes between Ctl and L was 2.83×10-39, indicating a statistically significant difference despite not being visually prominent in the volcano plot, which shows comprehensive differential gene expression in keratinocytes (Figure 1C; Figure 1-figure supplement 1D).”

      Page 54: In the Figure legend of Figure 1-figure supplement 1D in the resubmitted manuscript:

      “(D) The volcano plot displays changes in gene expression in psoriatic L compared to Ctl.”

      Page 30 Line 481 in the resubmitted manuscript: In the “Data processing of single-cell RNA-sequencing and bulk RNA-sequencing” section.

      “The data was integrated into an h5ad file, which can be visualized in Cellxgene VIP (K. Li et al., 2022). We then performed differential analysis between two groups of cells to identify differential expressed genes using Welch’s t-test. Multiple comparisons were controlled using the Benjamini-Hochberg procedure, with the false discovery rate set at 0.05 and significance defined as padj < 0.05.”

      Figure 2B

      The results narrative notes WT->WT is comparable to KO->WT. No statistics are given for this comparison. It appears the difference is less than the other comparisons, but still may be significant. Also, in the supplemental for Figure 2B, there appear to be missing columns for the 4 BM chimera groups (columns for WT and KO, but not 4 columns for each donor: recipient pair).

      We sincerely apologize for any confusion. We presented the results of the chimeric mice in Figure 3, and Figure 3-source data 1 shows the 4 BM chimera groups. In Figure 3B, the p-value for the comparison between WT->WT mice and KO->WT mice was 0.7988, as indicated in Figure 3-source data 1.

      Figure 3B

      While ear skin is not easily obtainable at day 0 for comparison, why not also include back skin at Wk 8? If the back skin epidermis is thicker like the ear skin, it supports the ear skin conclusion and adds a more consistent comparison. If the back skin epidermis is not thicker, what would be the author's explanation as to the why only ear skin epidermis is thicker in KO mice at 8 weeks?

      We appreciate and completely agree with the reviewer’s insightful comment. We have added images and dot plots of the back skin at Week 8 in Figure 4B. Since the back skin epidermis is thicker, similar to the ear skin, these results support the conclusion drawn from the ear skin data. Regarding Figure 4C, which shows the expression of Sema4a in the epidermis and dermis of 8-week-old WT mouse ear, we have modified the sentence in the manuscript to ‘the epidermis of WT ear at Week 8’ for clarification.

      Page 12 Line 180 in the resubmitted manuscript:

      “While epidermal thickness of back skin was comparable at birth (Figure 4B), on week 8, epidermis of Sema4AKO back and ear skin was notably thicker than that of WT mice (Figure 4B), suggesting that acanthosis in Sema4AKO mice is accentuated post-birth.”

      Page 47: In the Figure legend of Figure 4B in the resubmitted manuscript:

      “(B) Left: representative Hematoxylin and eosin staining of Day 0 back and Wk 8 back and ear. Scale bar = 50 μm. Right: Epi and Derm thickness in Day 0 back (n = 5) and Wk 8 back (n = 5) and ear (n = 8).”

      Figures 3C&D, Figures 4 D-F

      The figures might be easier to read if some of the data is moved to supplemental, especially in Figure 4, which has 36 panels just in D-F. Conversely, the dLN data is important in establishing the skin microenvironment as important in the accumulation of γδ cells and IL-17 production in the setting of Sema4a KO, so this might be more impactful if moved to the main figure.

      We appreciate and agree with your comments. As recommended, we have moved data from Figure 3C and 4D-F to the supplemental section. The dLN data have been moved to the main figure as Figure 4E. This has improved the readability of the figures.

      Figure 5 and Figure 6 might work better if combined. The differences in keratinocytes in psoriasis are well-known, so the novelty is how Sema4a KO skin appears to share similar differences. This would be easier to see if compared side-by-side in the same figure. Also, there is an opportunity to show this more rigorously by performing RNA-seq on WT vs Sema4a KO skin. Showing a larger set of DEGs that trend similarly between Ctl/NL psoriasis and WT/Sema4a KO skin in a heatmap would bolster the conclusion that Sema4a deficiency contributes to a psoriasis-like skin defect.

      We appreciate your valuable suggestion. Following your recommendation, we have combined Figures 5 and 6 to facilitate a side-by-side comparison. This highlights the similarities between Sema4AKO skin and psoriasis, making it easier to observe differences in keratinocytes. Additionally, we performed RNA-seq on WT and Sema4a KO epidermis (n = 3 per group). We analyzed the raw count data using iDEP 2.0 (Ge S.X., BMC Bioinformatics, 2018), setting the minimal counts per million to 0.5 in at least one library. Differential gene expression analysis was conducted using DEseq2, with an FDR cutoff of 0.1 and a minimum fold change of 2. As a result, we identified 46 upregulated and 70 downregulated genes in Sema4AKO mice compared to WT mice (see the volcano plot and heat map). However, except for S100a8, we did not observe significant expression changes in non-lesional psoriasis-related genes between WT and Sema4AKO mice. In the future, we aim to identify subtle stimuli that could cause gene expression changes between these groups and we would like to perform additional RNA-seq experiments.

      Author response image 1.

      Author response image 2.

      Page 48: The Figure title of Figure 5 in the resubmitted manuscript:

      “Figure 5: Sema4AKO skin shares the features of human psoriatic NL.”

      SEMA4A is not significantly DE between Ctl and NL in the psoriasis RNA-seq data. If a lower expression of SEMA4A in psoriasis skin is a driving part of the phenotype, why is this not observed in the RNA-seq data? Presumably, this could be explained by infiltration of immune cells with increased SEMA4A expression, like in the scRNA-seq data in Figure 1. If so, might it be useful to analyze WT->KO chimera mice similarly to global KO mice in Figures 6-8? This might more accurately reflect what is happening in psoriasis, if epidermal SEMA4A expression is low, but immune expression is not. The KO data on their own nicely show a skin phenotype, but these additional experiments might more closely mimic psoriatic disease and increase the rigor and impact of the study.

      We really appreciate your insightful comments. Due to the limitations of the animal experimentation facility, we regret that we are unable to create additional chimeric mice. Although our analysis is limited, we compared IL-17A production from T cells of WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry (see Author response image 3 below; n = 6 for WT→KO, n = 4 for KO→KO). This comparison revealed that IL-17A production from T cells was comparable, regardless of whether they were derived from WT or Sema4AKO mice, when the skin constituent cells were derived from Sema4AKO. We appreciate the value of your advice, and agree that investigating keratinocyte differentiation and mTOR signaling in the epidermis, using either WT→KO chimeric mice or keratinocyte-specific Sema4A-deficient mice, is a crucial next step in our research.

      Author response image 3.

      Figure 8

      Rapamycin was able to partially reverse the psoriasis-like skin phenotype in Sema4a KO mice. Would rapamycin also be effective in the more severe disease induced by IMQ in Sema4a KO mice? While partially reducing the effect of Sema4a KO on steady-state skin with rapamycin strengthens the link to mTOR dysregulation, it did not change skin thickness. It's unclear if this would be useful clinically for patients with well-controlled psoriasis (NL skin). Would it be useful to reverse active, lesional psoriatic skin changes? Testing this might yield results more relevant to clinicians and patients.

      We are grateful for your valuable feedback. Rapamycin showed effectiveness in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4AKO mice. Rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16. We included these results to Figure 7-figure supplement 2. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes and may be of interest to clinicians and patients.

      Page 17 Line 269 in the resubmitted manuscript:

      “Next, we investigated whether intraperitoneal rapamycin treatment effectively downregulates inflammation in the IMQ-induced murine model of psoriasis in Sema4AKO mice (Figure 7-figure supplement 2A). Rapamycin significantly reduced epidermal thickness compared to vehicle treatment (Figure 7-figure supplement 2B). Additionally, rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16 (Figure 7-figure supplement 2C). While the upregulation of Il17a in the Sema4AKO epidermis in IMQ model was not clearly modified by rapamycin (Figure 7-figure supplement 2C), immunofluorescence revealed a decrease in the number of CD3 T cells in Sema4AKO epidermis by rapamycin (Figure 7-figure supplement 2D). In the naive states, mTORC1 primarily regulates keratinocyte proliferation, whereas mTORC2 mainly involved in the keratinocyte differentiation through Sema4A-related signaling pathways. Conversely, in the psoriatic dermatitis state, rapamycin downregulated both keratinocyte differentiation and proliferation markers. The observed similarities in Il17a expression following treatment with rapamycin and JR-AB2-011, regardless of additional IMQ treatment, suggest that Il17a production is not significantly dependent on Sema4A-related mTOR signaling.”

      Page 29 Line 461 in the resubmitted manuscript: In the “Inhibition of mTOR” section.

      “To analyze the preventive effectiveness of rapamycin in an IMQ-induced murine model of psoriatic dermatitis, Sema4AKO mice were administered either vehicle or rapamycin intraperitoneally from Day 0 to Day 17, and IMQ was topically applied to both ears for 4 days starting on Day 14. Then, on Day 18, ears were collected for further analysis.”

      Page 71: Figure 7-figure supplement 2 in the resubmitted manuscript:

      “Figure 7-figure supplement 2: Rapamycin treatment reduced the epidermal swelling observed in IMQ-treated Sema4AKO mice.

      (A) Experimental scheme. (B) The Epi thickness on Day 18. (n = 10 for Ctl, n = 12 for Rapamycin). (C) Relative expression of keratinocyte differentiation markers and Il17a in Sema4AKO Epi (n = 10 for Ctl, n = 12 for Rapamycin). (D) The number of T cells in the Epi (left) and Derm (right), under Ctl or rapamycin and IMQ treatments (n = 10 for Ctl, n = 12 for Rapamycin). Each dot represents the sum of numbers from 10 unit areas across 3 specimens. A-C: *p < 0.05, **p < 0.01. NS, not significant.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To know whether the decrease of Sema4A in the epidermis of psoriasis patients is a result or a cause of psoriasis, it is necessary to show how the expression of Sema4A in epidermal cells is regulated. Shouldn't the degree of change in the expression of essential molecules (which is the cause of psoriasis) be more pronounced in L than in NL?

      We surveyed transcription factors of human Sema4A using GeneCards and found that NF-κB is the transcription factor most frequently associated with psoriasis. Wang et al. (Arthritis Res Ther. 2015) indicated NF-κB-dependent modulation of Sema4A expression in synovial fibroblasts of rheumatoid arthritis. However, since NF-κB expression is reportedly upregulated in psoriasis lesions, other transcription factors may function as key modulators of Sema4A expression in the epidermis.

      Although the molecules causing psoriasis remain to be elucidated, we investigated the correlation between the expression of psoriasis-related essential molecules in keratinocytes—such as S100A7A, S100A7, S100A8, S100A9, and S100A12—and SEMA4A expression in L and NL samples using qRT-PCR. We could not identify a correlation between these molecules and SEMA4A expression. We added a note to the limitations section to acknowledge that we were not able to reveal how Sema4A expression is regulated and that we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.

      Page 21 Line 328 in the resubmitted manuscript:

      “We were not able to reveal how Sema4A expression is regulated. Although we showed that downregulation of Sema4A is related to the abnormal cytokeratin expression observed in psoriasis, we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.”

      (2) Using bone marrow chimeric mice, it has already been reported that hematopoietic cells contain keratinocyte stem cells. Therefore, their interpretation is not supported by the results of their bone marrow chimeric mice experiment, and it is essential to generate keratinocyte-specific Sema4A knockout mice and perform similar experiments to support their interpretation.

      We value the reviewer’s insightful comment. We have assessed the expression of Sema4a in the epidermis of WT→KO chimeric mice using qRT-PCR. Our findings indicate that Sema4a expression levels in the epidermis of these mice are minimal (cycle threshold values of Sema4a ranged from 31.9 to not detected in WT→KO chimeric mice, whereas they ranged from 24.5 to 26.2 in WT→ WT mice). Consequently, we believe that the impact of keratinocyte stem cells derived from WT-hematopoietic cells is limited in this model. We appreciate this opportunity to clarify our results and will consider the generation of keratinocyte-specific Sema4A knockout mice for future experiments to further substantiate our interpretation.

      Page 11 Line 159 in the resubmitted manuscript:

      “Since it has already been reported that bone marrow cells contain keratinocyte stem cells (Harris et al., 2004; Wu, Zhao, & Tredget, 2010), we confirmed that epidermis of mice deficient in non-hematopoietic Sema4A (WT→KO) showed no obvious detection of Sema4a, thereby ruling out the impact of donor-derived keratinocyte stem cells infiltrating the host epidermis (Figure 3-figure supplement 1A).”

      Page 60: In the Figure legend of Figure 3-figure supplement 1A in the resubmitted manuscript:

      “(A) Sema4a expression in the Epi of WT→ WT mice and WT→ KO mice (n = 8 for WT→ WT, n = 7 for WT→ KO).”

      (3) Since Sema4A KO mice already have immunological and epidermal cell characteristics similar to psoriasis, albeit weak, it is possible that the nonspecific stimulus of simply topical IMQ may have appeared to exacerbate psoriasis. It is advisable to confirm whether a more psoriasis-specific stimulus, IL-23 administration, would produce similar results.

      Thank you for your suggestion. Following your advice, we have analyzed IL-23-mediated psoriasis-like dermatitis. To induce the model, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 was injected intradermally into both ears for 4 consecutive days. Unlike with the application of IMQ, there was no significant difference in ear thickness. However, H&E staining revealed that the epidermal thickness was significantly greater in KO mice compared to WT mice. Although a longer period of IL-23 induction might result in more pronounced ear swelling, we conducted this experiment over the same duration as the IMQ application experiment to maintain consistency. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis.

      The lack of significant difference in ear thickness changes with IL-23 administration might be due to IL-23 administration not reflecting upstream events of IL-23 production.

      We consider that in psoriasis, the expression of Sema4A in keratinocytes is likely more important than in T cells. Therefore, it makes sense that the phenotype difference was more pronounced with IMQ, which likely has a greater effect on keratinocytes compared to IL-23.

      Page 9 Line 137 in the resubmitted manuscript:

      “Though the imiquimod model is well-established and valuable murine psoriatic model (van der Fits et al., 2009), the vehicle of imiquimod cream can activate skin inflammation that is independent of toll-like receptor 7, such as inflammasome activation, keratinocyte death and interleukin-1 production (Walter et al., 2013). This suggests that the imiquimod model involves complex pathway. Therefore, we subsequently induced IL-23-mediated psoriasis-like dermatitis (Figure2-figure supplement 2A), a much simpler murine psoriatic model, because IL-23 is thought to play a central role in psoriasis pathogenesis (Krueger et al., 2007; Lee et al., 2004). Although ear swelling on day 4 was comparable between WT mice and Sema4AKO mice (Figure2-figure supplement 2B), the epidermis, but not the dermis, was significantly thicker in Sema4AKO mice compared to WT mice (Figure2-figure supplement 2C). We found that the proportion of CD4 T cells among T cells was significantly higher in Sema4A KO mice compared to WT mice, while the proportion of Vγ2 and DNγδ T cells among T cells was comparable between them (Figure 2-figure supplement 2D). On the other hand, focusing on IL-17A-producing cells, the proportion of IL-17A-producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from imiquimod-induced psoriasis-like dermatitis. (Figure 2-figure supplement 2E).”

      Page 24 Line 363 in the resubmitted manuscript: In the “Mice” section.

      “To induce IL-23-mediated psoriasis-like dermatitis, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 (BioLegend, San Diego, CA) was injected intradermally into both ears of anesthetized mice using a 29-gauge needle for 4 consecutive days.”

      Page 58: In the Figure legend of Figure 2-figure supplement 2 in the resubmitted manuscript:

      “IL-23-mediated psoriasis-like dermatitis is augmented in Sema4AKO mice.

      (A) An experimental scheme involved intradermally injecting 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 into both ears of WT mice and KO mice for 4 consecutive days. Samples for following analysis were collected on Day 4. (B and C) Ear thickness (B) and Epi and Derm thickness (C) of WT mice and KO mice on Day 4 (n = 12 per group). (D and E) The percentages of Vγ3, Vγ2, DNγδ, CD4, and CD8 T cells (D) and those with IL-17A production (E) in CD3 fraction in the Epi (top) and Derm (bottom) of WT and KO ears (n = 5 per group). Each dot represents the average of 4 ear specimens. B-E: *p < 0.05, **p < 0.01. NS, not significant.”

      (4) How is STAT3 expression in the epidermis crucial in the pathogenesis of psoriasis in Sem4AKO mice?

      We appreciate your insightful comment. In our study, given the established role of activated STAT3 in psoriasis, we investigated both total STAT3 and phosphorylated STAT3 (p-STAT3) levels in the naive epidermis of WT and Sema4AKO mice (See the figure below). Our findings indicate that STAT3 activation does not occur in the epidermis of Sema4AKO mice. Therefore, we speculated that the hyperkeratosis observed in Sema4AKO mice is due to aberrant mTOR signaling rather than STAT3 activation. STAT3 may be relevant to other pathways independent of Sema4A signaling, or it may function as a complex with other molecules in the Sema4A signaling.

      Author response image 4.

    1. eLife Assessment

      This paper presents the important discovery that lipid metabolic imbalance caused by Snail, an EMT-related transcription factor, contributes to the acquisition of chemoresistance in cancer cells. However, the incomplete support for the authors' claims is due to concerns about the causal relationship and lack of sufficient quantitative analysis. With strengthened evidence, this work would be of broad interest to researchers in the fields of cancer biology, lipid metabolism, and cell biology.

    2. Reviewer #1 (Public review):

      The authors focus on the molecular mechanisms by which EMT cells confer resistance to cancer cells. The authors use a wide range of methods to reveal that overexpression of Snail in EMT cells induces cholesterol/sphingomyelin imbalance via transcriptional repression of biosynthetic enzymes involved in sphingomyelin synthesis. The study also revealed that ABCA1 is important for cholesterol efflux and thus for counterbalancing the excess of intracellular free cholesterol in these snail-EMT cells. Inhibition of ACAT, an enzyme catalyzing cholesterol esterification, also seems essential to inhibit the growth of snail-expressing cancer cells.

      However, It seems important to analyze the localization of ABCA1, as it is possible that in the event of cholesterol/sphingomyelin imbalance, for example, the intracellular trafficking of the pump may be altered.<br /> The authors should also analyze ACAT levels and/or activity in snail-EMT cells that should be increased. Overall, the provided data are important to better understand cancer biology.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors discovered that the chemoresistance in RCC cell lines correlates with the expression levels of the drug transporter ABCA1 and the EMT-related transcription factor Snail. They demonstrate that Snail induces ABCA1 expression and chemoresistance, and that ABCA1 inhibitors can counteract this resistance. The study also suggests that Snail disrupts the cholesterol-sphingomyelin (Chol/SM) balance by repressing the expression of enzymes involved in very long-chain fatty acid-sphingomyelin synthesis, leading to excess free cholesterol. This imbalance activates the cholesterol-LXR pathway, inducing ABCA1 expression. Moreover, inhibiting cholesterol esterification suppresses Snail-positive cancer cell growth, providing potential lipid-targeting strategies for invasive cancer therapy.

      Strengths:

      This research presents a novel mechanism by which the EMT-related transcription factor Snail confers drug resistance by altering the Chol/SM balance, introducing a previously unrecognized role of lipid metabolism in the chemoresistance of cancer cells. The focus on lipid balance, rather than individual lipid levels, is a particularly insightful approach. The potential for targeting cholesterol detoxification pathways in Snail-positive cancer cells is also a significant therapeutic implication.

      Weaknesses:

      The study's claim that Snail-induced ABCA1 is crucial for chemoresistance relies only on pharmacological inhibition of ABCA1, lacking additional validation. The causal relationship between the disrupted Chol/SM balance and ABCA1 expression or chemoresistance is not directly supported by data. Some data lack quantitative analysis.

    4. Author response:

      Response to Reviewer 1

      We will investigate the intracellular localization of ABCA1 in both EpH4 and EpH4-Snail cells. We will also examine the changes in ACAT expression levels within these cell lines.

      Response to Reviewer 2

      We will first investigate whether the chemoresistance exhibited by EpH4-Snail cells can be abolished not only through pharmacological inhibition of ABCA1 but also by knocking out the ABCA1 gene. Regarding causality, as demonstrated in Figure 2, we have already shown that reducing cholesterol levels in EpH4-Snail cells decreases ABCA1 expression. To further explore this relationship, we will assess whether increasing sphingomyelin levels by adding ceramide to the culture medium, thereby correcting the sphingomyelin-to-cholesterol ratio, would reduce ABCA1 expression. Furthermore, we will evaluate whether lowering cholesterol levels in EpH4-Snail cells via simvastatin treatment, along with normalization of the sphingomyelin-to-cholesterol ratio, attenuates their resistance to the anticancer drug nitidine chloride. Additionally, we will incorporate quantitative analyses for several experiments, as suggested in the reviewers’ comments, to enhance the robustness of our findings.

    1. eLife Assessment

      The authors use deep mutational scanning to assess the effect of all possible protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. They develop new, more precise approaches, enabling them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor. In this important work, the authors provide convincing evidence that variants impact signaling through MC4R in different ways, that some defective variants are amenable to a corrector drug and that deep mutational scanning data could guide compound optimization.

    2. Reviewer #1 (Public review):

      Summary:

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths:

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses:

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells. Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      (2) Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      (3) It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      (4) Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      (5) To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      (6) Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      (7) As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

    3. Reviewer #2 (Public review):

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    4. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. eLife Assessment

      This important study demonstrates the therapeutic potential of recombinant human PDGF-AB/BB proteins in reducing the senescent signatures of primary human intervertebral disc cells. The study represents a significant advance in the treatment of intervertebral disc degeneration and can be applied broadly to other degenerative musculoskeletal tissues through the suppression of senescence. Solid evidence, primarily based on transcriptomic analysis and direct protein measurements from relatively homogeneous cell populations, supports the therapeutic potential of PDGF, but more experimental details would be needed to make the evidence stronger.

    2. Reviewer #1 (Public review):

      The authors, Zhang et al., demonstrate the beneficial effects of treating degenerate human primary intervertebral disc (IVD) cells with recombinant human PDGF-AB/BB on the senescence transcriptomic signatures. Utilizing a combination of degenerate cells from elderly humans and experimentally induced senescence in young, healthy IVD cells, the authors show the therapeutic effects on mRNA transcription as well as cellular processes through informatics approaches.

      One notable strength of this study is the use of human primary cells and recombinant forms of human PDGF-AB/BB proteins, which increases the translational potential of these in vitro studies. The manuscript is well-written, and the informatics analyses are thorough and clearly presented.

      However, in its current form, the study does not provide sufficient experimental details, and clarifications are needed. These are as follows:

      (1) The source of PDGF-AB/BB proteins is not detailed.<br /> (2) The irradiation parameters are not adequately reported - the authors should consider (PMCID: PMC5495460) for the parameters that should be reported.<br /> (3) The criteria for young and old patient donors are not explicitly described - though from the table, one presumes the cut-off for young is 27 years old.<br /> (4) What is the rationale for using different concentrations of PDGF-AB/BB in the degenerate cell and irradiation experiments?

      There are also a number of other issues the authors could consider. First, in the title and throughout the manuscript, the effects of PDGF-AB/BB are described as protective, yet in all the experiments, PDGF-AB/BB appears to be administered following either in vivo degeneration or in vitro irradiation, where protective effects (e.g., administration prior to insult) were not tested. Therefore, the effects of PDGF-AB/BB may be more accurately described as mitigating or therapeutic rather than protective.

      The authors state that the focus on NP (nucleus pulposus) cell studies is due to NP being the first site impacted during degeneration. However, this reviewer believes that this is because changes in the NP are more clinically evident (by imaging methods), despite degeneration often initiating from the AF (annulus fibrosus), e,g. through tears/microtears.

      A prior study has examined the effects of X-ray irradiation on NF-kB signaling in young and aged IVDs (PMCID: PMC5495460), and the authors may wish to consider this work.

    3. Reviewer #2 (Public review):

      Summary:

      This work highlights a novel role for platelet-derived growth factor (PDGF) in mitigating cellular senescence associated with age-related and painful intervertebral disc degeneration. Prior literature has demonstrated the importance of the accumulation of senescent cells in mediating many of the pathological effects associated with the degenerate disc joint such as inflammation and tissue breakdown. In this study, the authors treat clinically relevant human nucleus pulposus and annulus fibrosus cells from patients undergoing discectomy with recombinant PDGF-AB/BB for 5 days and then deep phenotyped the outcomes using bulk RNA sequencing. In addition, they irradiated healthy human disc cells which they subsequently treated with PDGF-AB/BB examining the expression of SASP-related markers and also PDGFRA receptor gene expression. Overall PDGF was able to down-regulate many senescent-associated pathways and the degenerate phenotype in IVD cells. Altered pathways were associated with neurogenesis, mechanical stimuli, metabolism, cell cycle, reactive oxygen species, and mitochondrial dysfunction. Overall the authors achieved their aims and the results by and large support their conclusions although improvements could be made to enhance the rigor of the study and findings.

      Strengths:

      A major strength of this study is the use of human cells from patients undergoing discectomy for disc herniation as well as access to healthy human cells. Investigating the role of PDGF regarding cellular senescence in the degenerate disc joint is a novel and underexplored area of research which is a significant contribution to the field of spine. This study highlights a potential target for addressing cellular senescence where most of the prior focus has been on senolytic drugs. Such studies have broad implications for other age-related diseases where senescence plays a major role. The use of transcriptomics and therefore an unbiased approach to investigating the role of PDGF is also considered a strength as are the follow-up studies involving irradiating healthy human disc cells and treating these cells with PDGF. The combined assessment of both nucleus pulposus and annulus fibrosus cells in the context of these studies adds to the impact.

      Weaknesses:

      A weakness of these studies lies in the lack of experimental details provided in the methodology including the rationale for such methods/conditions. Many details such as the specific culture models utilized, substrates, cell density, and media components are missing which impacts rigor. Such details would strengthen the manuscript and the ability to replicate and build on such work/findings. An additional weakness relates to qualitative data presented such as the B-galactosidase assay and immunofluorescence of senescence-associated proteins such as P21 and P16. Quantification of such data sets would greatly strengthen the studies and lend further support to the hypotheses. The study in its current form could be strengthened by the inclusion of mechanistic studies probing the downstream PDGF receptor-associated pathways for example specifically targeting or modulating the activity of the PDGF receptor PDGFRA including validation of the gene data for PDGFRA with protein level data to determine if the transcripts are being translated to protein. The claim that in annulus fibrosus cells, PDGF do not mediate their effects via the PDGFRA does not appear to be supported by the current data as only gene expression for the receptor was assessed with no functional or mechanistic studies being performed. Further discussion, interpretation, and direct comparison of the nucleus pulposus and annulus fibrosus data sets would be helpful for the readers. The magnitude of changes related to the effects of PDGF-BB on the S-phase in irradiated NP and AF cells between control and treated groups seem small making interpretation of these findings challenging.

    1. eLife Assessment

      In this manuscript, Yao et al. present solid evidence to show that MnMYL3 may serve as a receptor for NNV via macropinocytosis. This manuscript is valuable for understanding the molecular mechanisms of NNV cell entry. However, the manuscript will benefit from broader implications of these findings for cell entry of other viruses.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays, and so on. In general, the results are clearly presented in the manuscript.

      Weaknesses:

      For the writing in the introduction and discussion sections, the author Yao et al mainly focus on the viral pathogens and fish in Aquaculture, the meaning and novelty of results provided in this manuscript are limited, and not broad in biology. The authors should improve the likely impact of their work on the viral infection field, maybe also in the evolutionary field with the fish model.

      (1) Myosin is a big family, why did authors choose MYL3 as a candidate receptor for NNV?

      (2) What is the relationship between MmMYL3 and MmHSP90ab1 and other known NNV receptors? Why does NNV have so many receptors? Which one is supposed to serve as the key entry receptor?

      (3) In vivo knockout of MYL3 using CRISPR-Cas9 should be conducted to verify whether the absence of MYL3 really inhibits NNV infection. Although it might be difficult to do it in marine medaka as stated by the authors, the introduction of zebrafish is highly recommended, since it has already been reported that zebrafish could serve as a vertebrate model to study NNV (doi: 10.3389/fimmu.2022.863096).

      (4) The results shown in Figure 6 are not enough to support the conclusion that "RGNNV triggers macropinocytosis mediated by MmMYL3". Additional electron microscopy of macropinosomes (sizes, morphological characteristics, etc.) will be more direct evidence.

      (5) MYL3 is "predominantly found in muscle tissues, particularly the heart and skeletal muscles". However, NNV is a virus that mainly causes necrosis of nervous tissues (brain and retina). If MYL3 really acts as a receptor for NNV, how does it balance this difference so that nervous tissues, rather than muscle tissues, have the highest viral titers?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

    1. eLife Assessment

      This important work develops a new protocol to experimentally perturb target genes across a quantitative range of expression levels in cell lines. The evidence supporting their new perturbation approach is compelling, and the computational analyses to better understand dosage response relationships between genes are convincing. The study will be of broad interest to scientists in the fields of functional genomics and biotechnology. However, the evidence supporting the conclusions can be further improved.

    2. Reviewer #1 (Public review):

      In this manuscript, Domingo et al. present a novel perturbation-based approach to experimentally modulate the dosage of genes in cell lines. Their approach is capable of gradually increasing and decreasing gene expression. The authors then use their approach to perturb three key transcription factors and measure the downstream effects on gene expression. Their analysis of the dosage response curve of downstream genes reveals marked non-linearity.

      One of the strengths of this study is that many of the perturbations fall within the physiological range for each cis gene. This range is presumably between a single-copy state of heterozygous loss-of-function (log fold change of -1) and a three-copy state (log fold change of ~0.6). This is in contrast with CRISPRi or CRISPRa studies that attempt to maximize the effect of the perturbation, which may result in downstream effects that are not representative of physiological responses.

      Another strength of the study is that various points along the dosage-response curve were assayed for each perturbed gene. This allowed the authors to effectively characterize the degree of linearity and monotonicity of each dosage-response relationship. Ultimately, the study revealed that many of these relationships are non-linear, and that the response to activation can be dramatically different than the response to inhibition.

      To test their ability to gradually modulate dosage, the authors chose to measure three transcription factors and around 80 known downstream targets. As the authors themselves point out in their discussion about MYB, this biased sample of genes makes it unclear how this approach would generalize genome-wide. In addition, the data generated from this small sample of genes may not represent genome-wide patterns of dosage response. Nevertheless, this unique data set and approach represents a first step in understanding dosage-response relationships between genes.

      Another point of general concern in such screens is the use of the immortalized K562 cell line. It is unclear how the biology of these cell lines translates to the in vivo biology of primary cells. However, the authors do follow up with cell-type-specific analyses (Figures 4B, 4C, and 5A) to draw a correspondence between their perturbation results and the relevant biology in primary cells and complex diseases.

      The conclusions of the study are generally well supported with statistical analysis throughout the manuscript. As an example, the authors utilize well-known model selection methods to identify when there was evidence for non-linear dosage response relationships.

      Gradual modulation of gene dosage is a useful approach to model physiological variation in dosage. Experimental perturbation screens that use CRISPR inhibition or activation often use guide RNAs targeting the transcription start site to maximize their effect on gene expression. Generating a physiological range of variation will allow others to better model physiological conditions.

      There is broad interest in the field to identify gene regulatory networks using experimental perturbation approaches. The data from this study provides a good resource for such analytical approaches, especially since both inhibition and activation were tested. In addition, these data provide a nuanced, continuous representation of the relationship between effectors and downstream targets, which may play a role in the development of more rigorous regulatory networks.

      Human geneticists often focus on loss-of-function variants, which represent natural knock-down experiments, to determine the role of a gene in the biology of a trait. This study demonstrates that dosage response relationships are often non-linear, meaning that the effect of a loss-of-function variant may not necessarily carry information about increases in gene dosage. For the field, this implies that others should continue to focus on both inhibition and activation to fully characterize the relationship between gene and trait.

    3. Reviewer #2 (Public review):

      Summary:

      This work investigates transcriptional responses to varying levels of transcription factors (TFs). The authors aim for gradual up- and down-regulation of three transcription factors GFI1B, NFE2, and MYB in K562 cells, by using a CRISPRa- and a CRISPRi line, together with sgRNAs of varying potency. Targeted single-cell RNA sequencing is then used to measure gene expression of a set of 90 genes, which were previously shown to be downstream of GFI1B and NFE2 regulation. This is followed by an extensive computational analysis of the scRNA-seq dataset. By grouping cells with the same perturbations, the authors can obtain groups of cells with varying average TF expression levels. The achieved perturbations are generally subtle, not reaching half or double doses for most samples, and up-regulation is generally weak below 1.5-fold in most cases. Even in this small range, many target genes exhibit a non-linear response. Since this is rather unexpected, it is crucial to rule out technical reasons for these observations.

      Strengths:

      The work showcases how a single dataset of CRISPRi/a perturbations with scRNA-seq readout and an extended computational analysis can be used to estimate transcriptome dose responses, a general approach that likely can be built upon in the future.

      Weaknesses:

      (1) The experiment was only performed in a single replicate. In the absence of an independent validation of the main findings, the robustness of the observations remains unclear.

      (2) The analysis is based on the calculation of log-fold changes between groups of single cells with non-targeting controls and those carrying a guide RNA driving a specific knockdown. How the fold changes were calculated exactly remains unclear, since it is only stated that the FindMarkers function from the Seurat package was used, which is likely not optimal for quantitative estimates. Furthermore, differential gene expression analysis of scRNA-seq data can suffer from data distortion and mis-estimations (Heumos et al. 2023 (https://doi.org/10.1038/s41576-023-00586-w), Nguyen et al. 2023 (https://doi.org/10.1038/s41467-023-37126-3)). In general, the pseudo-bulk approach used is suitable, but the correct treatment of drop-outs in the scRNA-seq analysis is essential.

      (3) Two different cell lines are used to construct dose-response curves, where a CRISPRi line allows gene down-regulation and the CRISPRa line allows gene upregulation. Although both lines are derived from the same parental line (K562) the expression analysis of Tet2, which is absent in the CRISPRi line, but expressed in the CRISPRa line (Figure S3A) suggests substantial clonal differences between the two lines. Similarly, the PCA in S4A suggests strong batch effects between the two lines. These might confound this analysis.

      (4) The study uses pseudo-bulk analysis to estimate the relationship between TF dose and target gene expression. This requires a system that allows quantitative changes in TF expression. The data provided does not convincingly show that this condition is met, which however is an essential prerequisite for the presented conclusions. Specifically, the data shown in Figure S3A shows that upon stronger knock-down, a subpopulation of cells appears, where the targeted TF is not detected anymore (drop-outs). Also Figure 3B (top) suggests that the knock-down is either subtle (similar to NTCs) or strong, but intermediate knock-down (log2-FC of 0.5-1) does not occur. Although the authors argue that this is a technical effect of the scRNA-seq protocol, it is also possible that this represents a binary behavior of the CRISPRi system. Previous work has shown that CRISPRi systems with the KRAB domain largely result in binary repression and not in gradual down-regulation as suggested in this study (Bintu et al. 2016 (https://doi.org/10.1126/science.aab2956), Noviello et al. 2023 (https://doi.org/10.1038/s41467-023-38909-4)).

      (5) One of the major conclusions of the study is that non-linear behavior is common. This is not surprising for gene up-regulation, since gene expression will reach a plateau at some point, but it is surprising to be observed for many genes upon TF down-regulation. Specifically, here the target gene responds to a small reduction of TF dose but shows the same response to a stronger knock-down. It would be essential to show that his observation does not arise from the technical concerns described in the previous point and it would require independent experimental validations.

      (6) One of the conclusions of the study is that guide tiling is superior to other methods such as sgRNA mismatches. However, the comparison is unfair, since different numbers of guides are used in the different approaches. Relatedly, the authors point out that tiling sometimes surpassed the effects of TSS-targeting sgRNAs, however, this was the least fair comparison (2 TSS vs 10 tiling guides) and additionally depends on the accurate annotation of TSS in the relevant cell line.

      (7) Did the authors achieve their aims? Do the results support the conclusions?: Some of the most important conclusions are not well supported because they rely on accurately determining the quantitative responses of trans genes, which suffers from the previously mentioned concerns.

      (8) Discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:<br /> Together with other recent publications, this work emphasizes the need to study transcription factor function with quantitative perturbations. Missing documentation of the computational code repository reduces the utility of the methods and data significantly.

    1. eLife Assessment

      This study presents valuable data on the identification and function of a protein complex present at the Maurer's cleft organelles of Plasmodium falciparum-infected red blood cells. The evidence supporting the findings is solid, but would benefit from greater rigor in presentation and analysis.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Blancke Soares and Stäcker et al serendipitously identify a domain of the Plasmodium falciparum protein MSRP6 that mediates both export from the parasite into the infected red blood cell and association with the Maurer's cleft organelles found in the infected cell. The authors use this domain to identify a putative complex of proteins at the Maurer's cleft via proximity biotinylation. Six members of the complex are confirmed to interact with MSRP6 by co-immunoprecipitation.

      The functions of select proteins of this complex are further investigated with regard to the formation of Maurer's clefts. Disruption of PeMP2, PIESP2, and Pf332 resulted in morphological changes to the Maurer's clefts and prevented the anchoring of the Maurer's clefts to the infected red blood cell plasma membrane that normally occurs in the trophozoite stage. Curiously, disruption of MSRP6, the central member of the complex, did not affect Maurer's cleft anchoring. Mechanistically, how this complex affects Maurer's cleft structure and anchoring remains unclear.

      Finally, the authors show that the loss of Maurer's cleft anchoring observed upon disruption of PIESP2 or Pf332 does not affect cytoadherence of infected red blood cells via PfEMP1, arguing against a prior assumption that cleft tethering is required for the presentation of parasite-exported proteins on the infected red blood cell surface.

      Strengths:

      Maurer's clefts are enigmatic organelles found in red blood cells infected by Plasmodium falciparum that are presumed to play a role in trafficking exported parasite proteins to the surface of the red blood cells, though little is known about their biogenesis and function. The authors here convincingly identify a protein complex present at the Maurer's clefts using multiple orthogonal tools, and carry out assays that indicate this protein complex has a role in shaping and anchoring the Maurer's clefts at their final location at the red blood cell membrane. The data indicating that Maurer's cleft anchoring is dispensable for trafficking of P. falciparum exported proteins to the infected red blood cell membrane has implications for understanding the function of this organelle.

      Weaknesses:

      In many instances, the data lack appropriate controls that would be desirable for the highest level of rigor. Many, if not most, fluorescence microscopy assays lack untagged/parental controls (prepared in parallel and captured with the same settings) that are necessary to determine the validity of the data - that the observed signal is specific to the protein of interest and not due to autofluorescence or bleed-through from other channels. In other cases, wild-type controls are missing where data from disruption mutants are presented. Additionally, while some phenotypes are quantified, others are only qualitatively described where a more thorough quantitative investigation would be valuable. Finally, where phenotypes have been quantified, in many instances it is not clear that the analyses have included biological replicates as would be expected.

    3. Reviewer #2 (Public review):

      Summary:

      Soares et al characterize several P. falciparum exported proteins that localize to Maurer's Clefts (MCs), membrane structures formed in the host erythrocyte cytosol. MCs are thought to act as sorting stations that mediate the trafficking of effector proteins to the erythrocyte membrane, such as the surface adhesin and major virulence factor PfEMP1. While initially mobile within the host cytosol, MCs become anchored at the erythrocyte periphery around the time PfEMP1 appears on the RBC surface. While MC immobilization is thought to be important for the delivery of PfEMP1 onto the erythrocyte surface, this hypothesis has remained untested due to the lack of mutants that prevent anchoring. The study begins by determining the sequence features able to mediate the export of PF3D7_0830300 and MSRP6, both PEXEL-Negative Exported Proteins (PNEPs) with signal peptides. The authors show that in both proteins, a region downstream of the signal peptide is sufficient to mediate export, indicating the mature N-terminus is also important for the translocation of this type of PNEP, similar to other classes of exported proteins. Surprisingly, an additional C-terminal region of MSRP6 is also sufficient to mediate export when placed downstream of the signal peptide in the absence of other MSRP6 features. This region also mediates recruitment to MCs and was used as BioID bait to identify proximal MC proteins, several of which form a complex with MSRP6. Strikingly, disruption of certain MSRP6 interacting proteins (PeMP2, PIESP2, and Pf332) abolishes MC anchoring and in some cases also results in major changes in MC morphology. Surprisingly, neither PfEMP1 surface display nor cytoadhesion of infected RBCs is impacted in these mutants. This study features an impressive array of genetically modified parasites and will be of broad interest in providing the first functional analysis of MC anchoring, challenging the prevailing model for PfEMP1 trafficking within the infected RBC.

      Strengths:

      (1) The first section of the paper presents an in-depth dissection of the features that enable the export of signal peptide-containing PNEPs, confirming the mature N-terminus is sufficient for export across all known types of exported proteins. While it remains unknown how these features enable export, the results reinforce the universal importance of the mature N-terminus, whether generated by signal peptidase or Plasmepsin 5.

      (2) The discovery that a C-terminal region of MSRP6 (MAD) is also sufficient for export is novel. The authors suggest this may be the result of piggybacking on another exported protein, although the discussion acknowledges there are challenges with this model since unfolding by PTEX would be expected to disrupt these interactions. An alternative might be considered: the related protein MSRP7 is also exported but consists essentially of a signal peptide and MSP7-like domain without the large N-terminal region found in MSRP6. Presumably, the mature N-terminus of MSRP7 mediates export. If MSRP6 is derived from an exported predecessor composed only of the MSP7-like domain (like MSRP7), the MAD domain might retain the ancestral export information near the beginning of the MSP7-like domain. If this were the case, then the MAD domain (3cd region) should only be sufficient to mediate export when positioned immediately after the signal peptide as in the experiment in Fig 3C (SP-3cd-GFP). It would be interesting to determine if an SP-GFP-3cd construct is exported.

      (3) Disruption of PeMP2, PIESP2 or Pf332 is found to prevent MC anchoring. This is the most exciting part of the study as it provides the first set of mutants that interfere with anchoring, enabling the surprising observation that MC immobilization is not important for PfEMP1 surface display or cytoadhesion. The MC movement assay is a nice way to visualize anchoring and would be strengthened by a quantitative measure of colocalization between the time-lapse images (ie, Pearson correlation coefficient) to enable a statistical test. The use of SLI to specifically activate a var gene of choice is an exciting new approach that will be of great use to the PfEMP1 field together with the semi-automated binding assay that helps to increase throughput and reduce bias.

      Weaknesses:

      (1) At least two of the MSRP6 complex members were found to depend on other complex members for MC trafficking: PeMP3 depends on MSRP6 and Pf332 depends on PIESP2 (previously shown by Zhang et al 2018 and confirmed in the present study). While the authors disrupted all seven MSRP6 complex members, the impact on the trafficking of the other complex members was not systematically investigated. It would be particularly interesting to know which (if any) complex members are required for MC recruitment of PeMP2 since this protein is also needed for MC anchoring.

      (2) Some images of exported puncta are interpreted as localization to the MCs without a co-marker. Since other compartments have been identified in the RBC cytosol in addition to MCs (ie, J dots), an MC co-marker would help to verify these actually correspond to MCs. For example, in Figure 5B, GEXP18 gives an exported punctate appearance but lack of co-localization with SBP1 in Fig S2B shows that this does not correspond to MCs.

      (3) The authors show MAHRP2 localization is not impacted in their PIESP2 and Pf332 mutants and this is interpreted to indicate the tether structures are not disrupted. However, this conclusion requires actual analysis of the tether structures by electron microscopy since MAHRP2 association to MCs may not require tether integrity and could persist even if the tethers are altered or disrupted. Otherwise, this statement should be adjusted. Additionally, since T2A skipping efficiency can vary between constructs, it would be a good idea to perform a western blot to ensure that the SBP1-GFP and MAHRP2-mScarlet signals in Figure 8D,F reflect separated proteins.

      (4) The trypsin assays to monitor PfEMP1 surface display would benefit from a more detailed explanation of how the results were interpreted. For instance, though perhaps less intense than in the PIESP2, Pf332, and MSRP6 mutants, a Var01-protected fragment is also seen in the SBP1 mutant. Additionally, a protected fragment is indicated for most of the SBP1N controls (asterisk). As per the author's experimental design (lines 956-957), does this indicate that the RBC membrane was partially compromised during the experiment? In line 505, the trypsin assay data in the mutants is interpreted relative to the parent IT4var01-HA line but no data is shown for the parent.

    4. Reviewer #3 (Public review):

      Summary:

      Malaria is caused by Plasmodium falciparum parasites that infect, grow, and reproduce inside red blood cells. The parasites extensively modify the blood cells they infect, by exporting hundreds of proteins into the red blood cell compartment. One of the most important modifications made by the parasite is to display adhesive proteins on the blood cell surface which attach the infected cells to walls of small blood vessels. This can lead to organ damage resulting in serious disease complications and there is great interest in blocking the adhesive process to reduce disease. This study investigates the function of an atypical, exported protein that along with other proteins maintains the integrity of membranous sacs formed by the parasite in the blood cell compartment. These sacs are widely believed to help organise the display of the adhesive proteins on the infected blood cell surface. This study challenges this dogma by showing that disruption of the sacs does not prevent the display of the adhesive proteins suggesting alternative pathways are likely involved in adhesive protein display.

      Strengths:

      The conclusions are supported by a beautiful series of live parasite images.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    1. eLife Assessment

      Morphological characteristics and phenotypes of mutations in key developmental genes suggest that head, trunk, and tail development are regulated by discernible modules. Gdf11 signalling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos. This important study presents convincing evidence that Tgfbr1 acts upstream of Isl1 (a pivotal effector of Gdf11 signalling) and regulates blood vessels, the lateral plate mesoderm, and the endoderm associated with the trunk-to-tail transition. Together with the previous studies, this work identifies a key signal that acts as the pivot of the trunk-to-tail transition.

    2. Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well.

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify.

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois.

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well. 

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify. 

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois. 

      Strengths: 

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition. 

      Weaknesses: 

      The authors rely on the expression of a small number of key regulatory genes to interpret the developmental defects. The alternative possibilities remain to be ruled out thoroughly. The manuscript is also quite descriptive and would benefit from more focused highlighting of the novelty regarding the absence of Tgfbr1 in the mouse embryo. They should also strengthen some of their conclusions with more details in the results.

      Although we used a limited number of key regulatory genes to interpret the phenotype, these genes were carefully chosen to focus on specific processes involving the lateral mesoderm, its derivatives, and the endoderm. In addition to these markers, we included references to other relevant markers that were previously analyzed and initially led us to examine the lateral plate mesoderm and tail gut in Tgfbr1 mutants. To strengthen our analysis, we have now incorporated additional data to clarify specific phenotypes. For instance, in situ hybridization (ISH) for Shh further confirms abnormalities at the caudal end of the endoderm in mutant embryos, while no endodermal defects are observed in the trunk region. We also included an analysis of the intermediate mesoderm, which shows abnormalities at the same level as those found in the lateral plate mesoderm and endoderm of Tgfbr1 mutants.

      It’s important to note that using additional markers to assess the epiblast/primitive streak of Tgfbr1 mutants at E7.5–E8.5, as suggested by a reviewer, is unlikely to yield new insights. At these early stages, Tgfbr1 mutant embryos do not display observable phenotypes in the main body axis. Data in this manuscript already demonstrate the absence of abnormalities at this stage, as shown in Figure 3 and Supplementary Figure 6. Additionally, the expression of certain genes showing abnormalities when the embryo would enter tail development, in the trunk their expression remains unaffected, indicating that trunk extension is not significantly impacted by Tgfbr1 deficiency. While transcriptomic analysis of these Tgfbr1 mutants could provide interesting insights, it would be more appropriate to focus on later developmental stages, which would be beyond the scope of the current study.

      The second major critique was that the manuscript is primarily descriptive. We disagree with this assessment. Several hypotheses were rigorously tested using genetic approaches, including Isl1 knockout experiments, cell tracing from the primitive streak with a newly generated Cre driver to activate a reporter from the ROSA26 locus, and assessment of extraembryonic endoderm fate in Tgfbr1 mutants by introducing the Afp-GFP transgene into the Tgfbr1 mutant background. Additionally, we conducted tracing analyses of tail bud cell contributions to the tail gut via DiI injection and embryo incubation. To address potential concerns regarding this experiment, we have included data showing the DiI position immediately after injection to confirm that it does not contact the tail gut. We also considered and accounted for potential DiI leakage into neuromesodermal progenitors to clarify the endodermal results.

      Our genetic and DiI experiments were specifically designed to differentiate between alternative hypotheses and to confirm hypotheses generated from other analyses. Additionally, improvements in some of the imaging data have helped address remaining concerns.

      Reviewer #1 (Recommendations For The Authors): 

      I have listed my suggestions as queries. The authors may perform experiments or clarify by editing the text to address them. 

      The authors state on Page 11 and elsewhere that the ventral lateral mesoderm is absent in the Tgfbr1 mutant. What is the basis for this conclusion? Are there specific markers for PCM or GT primordium? 

      The specific marker of PCM and GT primordium is Isl1. The absence of this marker in the Tgfbr1 mutants is shown in (Dias et al, 2020). The reference is introduced in the manuscript.

      A schematic illustrating the VLM and the expression patterns of Tgfbr1, Gdf11, etc., would be helpful. 

      Characterization of Gdf11 expression has been previously reported (e.g. McPherron et al 1999, cited in our manuscript). It is expressed in the region containing of axial progenitors before the trunk to tail transition and not expressed in the VLM. As for Tgfbr1 expression is hard to detect, likely because it is ubiquitously expressed at low level. We include in this document some pictures of an ISH, including a control using the Tgfbr1 mutants to illustrate that the staining resembling background actually represents Tgfbr1 expression. If the reviewers find it important, we can also incorporate these data into the manuscript. Under these circumstances, we feel that a schematic might not be very informative.

      Author response image 1.

      Image showing an example of an ISH procedure with a probe against Tgfbr1, showing widespread and low expression. The lower picture shows a ventral view of a stained wild type E10.5 embryo.

      Foxf1+ cells in the 'extended LPM' of Tgfbr1 mutants suggest fate transformation, or does it indicate the misexpression of marker gene otherwise suppressed by Tgfbr1 activity? The authors suggest that Foxf1+ cells are VLM progenitors from posterior PS trapped in the extended LPM. Do they continue to express PS markers? 

      The observation that both in wild type and Tgfbr1 mutant embryos Foxf1 expression in the trunk is restricted to the splanchnic LPM indicates that the absence of this marker in the somatic LPM is not the result of a suppression of its expression by Tgfbr1. In wild type embryos Foxf1 is also expressed in the posterior PS, regulated independently of its expression in the LPM (i.e. Shh-independent) and later in the pericloacal mesoderm (our supplementary figure 2). As Foxf1 expression in the posterior PS was not suppressed in the Tgfbr1 mutants, together with the absence of pericloacal mesoderm, we interpret that the Foxf1-positive cells in the two layers around the extended celomic cavity in the posterior end of the mutant embryos derived from the posterior PS, resulting from the absence of its normal progression through the embryonic tissues.

      We did not find expression of PS markers giving rise to paraxial mesoderm, like Tbxt, further suggesting that those cells could derive from the restricted set of cells within the posterior PS that contribute to the pericloacal mesoderm

      For example, the misexpression of Apela is interpreted as mis-localized endoderm cells. They show scattered Keratin 8 misexpression to support the interpretation. It would be more convincing if the authors tested the expression of other endoderm markers. 

      As indicated in the manuscript, we suggest that these cells are endoderm progenitors (p. 13), like those present at the posterior end of the gut tube at E9.5 and E10.5, that are unable to incorporate into the gut tube. Apela is not a general endodermal marker: it is expressed in the foregut pocket and the nascent cells of the hindgut/tail gut, becoming down regulated as cells take typical endodermal signatures. The presence of ectopic Apela expression in the extended LPM of the mutant embryos might indeed indicate the presence of progenitors that failed to downregulate Apela resulting from the lack differentiation-associated downregulation. This would also implicate the absence of definitive endodermal markers.

      The Nodal signaling pathway in the anterior PS drives endoderm development. It acts through Alk7. Does Tgfbr1 (Alk5) mutation impact endoderm development, in general? It isn't easy to assess this from the Foxa2 in situ RNA hybridization shown in Figures 6A and B. It would be helpful for the readers if the authors clarified this point. 

      In the pictures shown in Figure 7D-D’ it is already shown that the endoderm is mostly preserved until the region of the trunk to tail transition. The presence of a rather normal endoderm in the embryonic trunk can also be seen with Shh, a figure added as Supplementary Fig.5.

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention two interesting novel points which they should develop in the discussion, and probably also in the results. 

      (1) The authors speculate about the possible involvement of the posterior PS as a mediator of Gdf11/Tgfbr1 signaling activity. However, as mentioned in the manuscript, their experiments do not allow regional sublocalization within the PS... Here it would be important to assess/discuss in more detail which progenitors respond to this signaling activity and when they do it. At the very least, the authors should provide high-resolution spatiotemporal data of the expression of Tgfbr1 in the PS. 

      Tgfbr1 expression at this embryonic stage does not give clear differential patterns. The data reported for this expression in Andersson et al 2006 is very low quality and we have not been able to reproduce the reported pattern. On the contrary, all our efforts over the years provided a very general staining that could even be interpreted as background. When we now included Tgfbr1 mutants as controls, it became clear that the ubiquitous and low level signal observed in wild type embryos indeed represent Tgfbr1 expression pattern: low level and ubiquitous. We are attaching a figure to this document illustrating these observations. If required, this can also be included in the manuscript as a supplementary figure. 

      Also, the work of Wymeersch et al., 2019 regarding the lateral plate mesoderm progenitors (LPMPs) should be referred to and discussed here. 

      This was now added in the results (page 11) and in discussion (page 16). 

      For instance, are the LPMP transcriptomic differences detected between E7.5 and E8.5 caused by Tgfbr1 signaling activity? This question could be easily answered through a comparative bulk RNAseq analysis of the posterior-most region of the PS of mutant and WT embryos. The possible colocalization of Tgfb1 (Wymeersch et al., 2019) and Tgfbr1 in the LPMPs should also be addressed. 

      We agree with the suggestion that RNA-seq in the posterior PS of WT and mutant embryos might be informative. However, it is very likely that within the proposed timeframe (E7.5 to E8.5) that there are no significant differences between the wild type and the Tgfbr1 mutant embryos because there is no apparent axial phenotype in Tgfbr1 mutant embryos before the trunk to tail transition. Therefore, at this stage, we think that this experiment is out of the scope of the present manuscript. 

      (2) The activity of Tgfbr1 during the trunk-to-tail transition is critical for the development of tail endodermal tissues. Here the authors suggest again the involvement of the posterior PS/allantois region, but a similar phenotype can also be observed for instance in the absence of Snai1 in the caudal epiblast (Dias et al., 2020)... It would be important to assess/discuss the origin of those morphogenetic problems in the gut. Is it due to the reallocation of NMC cells into the CNH? The tailbud-EMT process? LPMPs specification?... Regional mutations or gain of functions of Snai1 or Tgfbr1 in the caudal epiblast would help answer the question.  

      The endodermal phenotype in the Snai1 mutants is different to that observed in the Tgfbr1 mutants. As can be observed in Figures 3, 4 and 5 of Dias et al. the absence of tailbud is replaced by a structure that extends the epiblast. As a consequence, the endoderm finishes at the base of that structure, even expanding to make a structure resembling the cloaca, which is different to what is seen in the Tgfbr1 mutants. In this case, the lack of tail gut is likely to result either from the lack of formation of the progenitors of the gut endoderm or from the dissociation of what would be the tail bud from the LPM. Actually, hindlimb/pericloacal mesoderm markers, like Tbx4, are preserved in the Snai1 mutant. As for the gain of function of Snai1 experiment, already reported also in Dias et al 2020, the destiny of these cells is not clear. The ISH for Foxa2 showed extra signals but as it is not an exclusive marker for endoderm it is not possible to know whether any of these signals correspond to endodermal tissues.

      Regarding the development of tail endodermal tissues, the authors suggest that it occurs from a structure derived from the PS that is located posteriorly, in the tailbud, after the tip of the growing gut. This is an important and novel point as it suggests that the primordia of the endoderm is not wholly specified during gastrulation. So the observation should be well supported. How can Anastasiia et al. distinguish such "structure" from the actual developing gut? Does it have a distinct molecular signature or any morphological landmark that enables its separation from the actual gut? The data suggests that the region highlighted in Supplementary Figure 4Ab contains part of the actual gut tube (the same is suggested in Figure 5B). If the authors think otherwise, they must characterize that region of the tailbud by doing a thorough morphological and gene/protein expression analysis and assess its potency, via transplantation experiments. Also, the authors' claim mostly relies on the DiI experiments and those have three problems: #1 Anastasiia et al. assess "tail" endodermal growth at E9.5 when the correct stage to do it is after E10.5 (after tailbud formation). 2# Incongruencies, low number (only three embryos), and diversity in the results shown in Figure 8 and Supplementary Figure 4. For instance, despite similar staining at 0h, the extension and amount of DiI present in the gut tube after 20h varies significantly amongst the differently labeled embryos. A possible explanation lies in the abnormal leakiness of the DiI labelings and that is confirmed by the observations shown in Supplementary Figure 4M-O; the same for Supplementary Figure 4G, which shows a substantial amount of DiI in the neural tube. 3# The authors must provide high-quality data showing which tissues/regions were labelled at time 0h, including transversal and sagittal sections as they did for the 20h time-point. Additionally, it is important to re-orient the sagittal optical sections to a position that also shows the neural tube (like a mid-sagittal section) and include information concerning the AP/DV axis, as well as the location of the transversal optical sections in the sagittal image. 

      As described in the reply to reviewer 1, Apela is expressed in the nascent tail gut endoderm but not in more anterior areas except for a foregut pocket, and becomes downregulated as the tube acquires endodermal signatures. Therefore, the structure to which the reviewer refers to might indeed represent a group of progenitors that extend the tail gut. And the observation that this property is observed only in the tail gut as it grows, already separates this region of the gut, which in the end do not contribute to mature organs, from more anterior areas of the endoderm (essentially anterior to the cloaca) that will become a relevant tissue of the intestinal organs. Our DiI labelling experiment was aimed to test whether this pool of cells contributes to the gut but does not allow to determine the nature of those cells, a question that will require further research (discussed on p. 17) and we think is beyond the scope of the present manuscript.

      Regarding the labelling at E10.5, we agree that the tail bud in terms of NMCs is not completely formed, for example, at E9.5 the neuropore is not yet closed. However, we are more interested in regression of the epiblast, which is complete by E9.5. Injecting at E9.5 also has technical advantages for us, first, because in our hands earlier embryos grow better in culture, and second, because it is easier to inject in the tailbud at E9.5 because it is a little bit bigger than at E10.5. Therefore, injecting at E9.5 is less prone to technical artifacts due to injection inaccuracy and compromised growth in culture.

      We agree that the injected DiI could also leak into NMPs, which might be located in the same area. However, while this could result in labeling of the neural tube, it would not affect the interpretation of the finding of labeled cells in the tail gut. Indeed, the presence of this label in the gut epithelium indicates the presence of progenitors in the injected region of the tail gut. We added some considerations of this the possible leakage into the results section of the manuscript (p. 15). We thank the reviewer for drawing our attention to this issue. 

      We also now provide high quality data showing labelled tissue at 0h in Supplementary figure 8A-c’, higher magnification images in Fig. 8, and reoriented optical sections in Fig.6 and in Supplementary Fig. 7, including axis and location of the sections as suggested by the reviewer.

      Minor concerns/comments: 

      (1) The abstract is quite long, though this might be fine for this journal. 

      (2) In relation to the comment on the abstract, the manuscript needs an initial Figure descrbing the events that are described in the introduction. Otherwise, the manuscript will only be accessible to mouse embryologists.

      We have a figure summarizing the results at the end of the manuscript, we think that including similar figure in the beginning might be redundant. What we could do, if required, is to include this type of schematic as a graphical abstract.

      (3) The authors need to clarify what they mean when they use the following expressions "PS fate" and "fate of the posterior PS".

      I do not think that we have used such expressions. Indeed, they did not come out when we run a “find” in the word document. However, they would mean the tissue that would come out from them at later developmental stages.

      (4) The assessment of Isl1 expression in Tgfbr1 mutant and transgenic mouse embryos would be better indicative of their molecular relationship than a comparative phenotypic analysis. 

      These data have been reported in Dias et al 2020 and Jurberg et al 2013, both cited in the manuscript.  

      (5) The authors should explain or discuss what the upregulation of Foxa2 in the posterior end of Tgfbr1 mutants means.

      While an upregulation is apparent in the figure, looking at other pictures we cannot be sure of this being a significantly quantifiable up-regulation. We therefore removed the statement from the text.

      (6) What happens to the intermediate mesoderm during the trunk-to-tail transition? Is Tgfbr1 involved in the regulation of its development?

      We have tested this using Pax2 and added the relevant data in Supplementary Fig. 1 and described in the results.

      (7) The term "potential" should not be used during the description of DiI labeling experiments as this technique only assesses cell fate.

      Corrected

      (8) Some figures lack AP/DV axis information (e.g. Figures 6, C, and D).

      Corrected

    1. eLife Assessment

      This study provides a methodological report on a modified adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), and its application to G protein-coupled receptors (GPCRs), which are the most abundant membrane proteins and key targets for drugs. The mwSuMD approach assists in sampling complex binding processes, leading to useful findings for GPCR activity, although results may be considered incomplete, given the high RMSD values reported and lack of validation using experimental data. The manuscript also needs corrected descriptions of high-resolution PDB structures and better relation to existing computational literature.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

    3. Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.<br /> Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.<br /> MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);<br /> b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDP-bound Gs protein;<br /> c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;<br /> d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.<br /> The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.<br /> The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.<br /> While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

    4. Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight into the structural basis for the pharmacology of G protein-coupled receptors.

      Weaknesses:

      Cholesterol may play a fundamental role in GPCR dimerization (as cited by the authors, Prasanna et al, "Cholesterol-Dependent Conformational Plasticity in GPCR Dimers"). Yet they do not use cholesterol in their simulations of the dimerization.

      We thank Reviewer #1 for the positive comment on mwSuMD.

      In the revised version of the manuscript, the section about the A<sub>2A</sub>/D2 receptors dimerization has been removed because largely speculative. We agree that the lack of cholesterol in those simulations added uncertainty to the presented results.

      Reviewer #2 (Public Review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      (1) Binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      (2) Molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      (3) Molecular recognition of the A1-adenosine receptor (A1R) and palmitoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      (4) The whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron;

      (5) The heterodimerization of D2 dopamine and A2A adenosine receptors (D2R and A2AR, respectively) and binding to a bi-valent ligand.

      The mwSuMD method is solid and valuable, has wide applicability, and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. The definition of the metrics is a user- and system-dependent process.

      The too many and ambitious case-studies undermine the accuracy of the output and reduce the important details needed for a methodological report. In some cases, the available CryoEM structures could have been exploited better.

      The most consistent example concerns AVP binding/unbinding to V2R. The consistency with CryoEM data decreases with an increase in the complexity of the simulated process and involved molecular systems (e.g. receptor recognition by membrane-anchored G protein and the process of nucleotide exchange starting from agonist recognition by an inactive-state receptor). The last example, GPCR hetero-dimerization, and binding to a bi-valent ligand, is the most speculative one as it does not rely on high-resolution structural data for metrics supervision.

      We praise Reviewer #2 for the detailed comment on the manuscript. In this revised version, the hetero-dimerization between A<sub>2A</sub>R and D<sub>2</sub>R has been removed. Also, results about GPCR case studies other than GLP-1R have been reduced and downgraded in importance to focus on the fundamental key points of the adaptive sampling method.  We agree that the consistency with cryoEM data tends to decrease with an increase in the complexity of the simulated process and involved molecular systems. While it is possible to approximate cryoEM results  our unbiased adaptive sampling technique finds its most interesting application in mechanistically unknown out-of-equilibrium processes rather than reproducing known experimental data perfectly. The simulated case studies we present showcase the versatility, speed and consistency of our adaptive method to explore energetically unbiased transitions.

      Reviewer #3 (Public Review):

      Summary:

      In the present work, Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between loops of GPCR and G proteins, which are not resolved experimentally, or the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      In its current form, the manuscript seems immature and in particular, the described results grasp only the surface of the complex molecular mechanisms underlying GPCR activation. No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are a reproduction of the previously reported structures.

      We thank Reviewer #3 for the positive comment on the work. The revised manuscript focuses more on the GLP-1R and Gs case studies. We believe it addresses the weaknesses raised by showing the behaviour of key structural motifs and providing new hypotheses about GDP release.  

      Reviewer #2 (Recommendations For The Authors):

      In this methodological report, Deganutti and co-workers propose an improved version of supervised molecular dynamics (SuMD), named multiple walker SuMD (mwSuMD). Such an adaptive sampling method was challenged in simulations of complex transitions involving GPCRs, which are out of reach by classical MD.

      Although less energy-biased than other enhanced sampling methods, mwSuMD requires knowledge of the atomic detail of the ligand-protein or protein-protein binding site/interfaces and the structural hallmarks of the states whose conversion the method is going to address. Such knowledge is, indeed, necessary to define the supervised metrics (e.g. distances, RMSD, etc), which is a user- and system-dependent process.

      We classify mwSuMD as an adaptive, rather than enhanced, sampling method as it does not use any energy bias. We agree with the Reviewer that some knowledge of the system is required to productively set up the simulations, but this is the case for almost any MD advanced methods.  

      The text requires improvement in the essential methodological details and cleaning of those parts is not properly instrumental in method validation.

      While attempting to prove the widest possible applicability of the method, the authors exaggerated the number of examples, which, in spite of the increasing complexity were only summarily described. Please, limit the case studies to AVP binding/unbinding to V2R and the whole process of GDP release from membrane-anchored Gs following activation of GLP1R by danuglipron. The latter case, indeed, involves small ligand binding (danuglipron), small ligand dissociation (GDP), receptor activation, and activated receptor binding to membraneanchored G protein and G protein conformational transition instrumental to nucleotide depletion, which is already too much. In this framework, the cases of Gs-β2AR and Gi-A2R recognition are redundant. Most importantly, the case of D2R-A2AR heterodimerization and binding to a bi-valent ligand must be eliminated. The reason is that the case is not entirely based on the mwSuMD and the biased protein-protein interface does not rely on highresolution data (i.e. no structural model of D2R-A2AR dimer has been determined so far). Last but not least, the high intrinsic flexibility of the bi-valent ligand adds further indetermination to the computational experiment. Being too speculative, the case-study does not serve to model validation.

      We thank the Reviewer for the suggestion. In the current revised form, the manuscript focuses on AVP binding/unbinding to V2R and the GLP-1R activation, Gs recognition and GDP release.

      While eliminating the three case studies mentioned above, the remaining ones should be described more extensively and clearly, highlighting the most productive setup for each system. Incidentally, listing the performance parameters (e.g. distribution mode and minimum RMSD) of each simulation setting in Table S1 is worth doing.

      More accuracy in the methodological description is needed.

      As for the supervised metrics, the rationale behind the choice of a particular index and whether it is the outcome of a number of trials must be declared and the selected indices must be better defined. Here there are a few examples.

      AVP-V2R case. It is not clear why the AVP centroids were computed on residues C1-Q4 (I suppose the Cα-atoms) and not on the Cα-atoms of the whole cyclic part (C1-C6). Along the same line, the choice of the Cα-atoms of four amino acid residues to compute the receptor binding-site centroids requires justification.

      We have amended the text to clarify that all the heavy atoms of AVP residues C1-Q4, which are anticipated to bind deep into V<sub>2</sub>R, were considered alongside V<sub>2</sub>R residues part of the peptide binding site (Cα atoms only). From our experience, the choice of including side chains or not for the definition of centroids usually does not affect the supervision output. It should only affect the output of mwSuMD simulations based on the RMSD which considers the specific relative distance from the reference. However, a benchmark of the differences produced by divergent selections is beyond the scope of the present work.

      GLP1R case. The statement: "Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R" is rather weak as a strategy to manually (?) define the input position of the ligand.

      As stated in the manuscript, placing the agonist in that position was driven by preliminary 8 μs of classic MD simulations that pointed out the possible path for binding.  We agree with the Reviewer that there is still some degree of arbitrarity in it and for this reason, we have not presented structural details of the F06882961 binding path.

      As for the supervised metrics, what does it mean "the distance between the ligand and GLP-1R TM7 residues L3797.34-F3817.36"? Was the distance computed between ligand and L379-F381 centroids? Also: "In the supervised stages, the distance between residues M386-L394 Gas of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 was monitored" was it an inter-centroid distance? Furthermore, "supervising the distance between AHD residues G70-R199 Gas and K300-L394Gas" was it the distance between the centroid of the AHD and the centroid of the C-terminal half of the Ras-like domain? In general, when more than two atoms are involved in distance calculation, please, specify if the distance is inter-centroid.

      Also: "During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Ca atoms), were supervised" was the RMSD computed without superimposing the ligand to estimate its roto-translations?

      We have added details about the selections used for computing centroids throughout the methods section. For example, all the heavy atoms of F06882961 and the Ca atoms of L379-F381 were considered. RMSD values during GLP-1R activation were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Ca atoms). This now has been specified in the text.

      The authors considered the 7LCJ GLP1R-danuglipron complex as a fully active reference state instead of considering the receptor from a ternary complex with Gs. The ternary complex (7LCI) was indeed considered as a reference only in simulations of receptor-G protein recognition. 

      7LCJ and 7LCI are both fully active states. The main difference is that in 7LCJ, Gs coordinates were not deposited. Indeed, their RMSD computed on the TMD Ca atoms and F06882961 is 0.63 Å and 0.54 Å, respectively.

      Most importantly, the ternary complex chosen by the authors is not adequate as a reference for simulating the "opening" of the AHD because it bears a miniGs, hence, missing the AHD. In that framework, such an opening is rather vague and was not properly supervised by mwSuMD. The authors must repeat metrics supervisions by using, as a reference, the 6X1A ternary complex, which bears a displaced AHD. This would likely lead to a different path of GDP release.

      To the best of our knowledge, there is no evidence that a specific open conformation of the AHD is linked to GDP release. In support, we note that in GPCR ternary complexes, the AHD is usually not modelled because of its high flexibility. The only body of evidence we are aware of is that AHD must open up to allow GDP release. For this reason,  we decided to supervise the distance between AHD and the Ras domain without using a reference.

      In the statement: "The AHD opening was simulated starting from the best GLP-1R:Gs binding mwSuMD replica" the definition "best binding" requires clarification.

      This has been amended, specifying that Replica 2 was considered the “best replica” due to the closed deviation to the cryoEM structure.

      As for the case study on β2-AR-Gs recognition, I strongly suggest to eliminate it. However, I'd like to make some comments. The sentence: "the adrenergic β2 receptor (b2 AR) in an intermediate active state was downloaded from GPCRdb (https://gpcrdb.org/)" is vague as it does not indicate what intermediate active state structure was used. Since the goal of the case study was to probe the method in simulating receptor-G protein binding, it would have been better to start with a fully active state of the receptor like the 4LDO structure, employed by the authors only to extract epinephrine.

      mwSuMD is designed to provide insights into structural transitions. We started from an intermediate active state of β2-AR in complex with adrenaline because resembling the most populated state stabilised by a full agonist according to NMR studies (DOI:10.1016/j.cell.2015.08.045); the fully-active β2-AR conformation is stabilized only after Gs binding. However, following the Reviewer’s suggestion, we have reduced the presented results for the β2-AR-Gs recognition.

      Also in this case, it is not clear if the supervised receptor-G protein distance is between the centroid of the whole 7-helix bundle and the centroid of Gs α5. It is not clear why the TM6 RMSD concerned only the cytosolic end of the helix and did not include the kink region. With that selection, to estimate the outward displacement, RMSD should have been computed without superimposing the considered portion (once all remaining Cα-atoms of the receptors are superimposed).

      As the Reviewer pointed out above, some knowledge of the system is required to set up mwSuMD. Using more generic metrics as we did in this case, like the distance between the whole TMD and Gs α5 represents a general approach applicable to other GPCRs, that should allow orthogonal metrics to evolve independently from the supervision.

      As now specified in the text, the superimposition for RMSD calculation was performed on residues 40 to 140 Ca atoms, hence not considering TM6.

      As for the A1R-Gi recognition, as already stated, I strongly suggest eliminating it. However, I'd like to add some comments. I would discourage the employment of an AlphaFold model for simulations deputed to model validation in general and, in particular, when highresolution structures are available. In this case, the authors would have used the 1GP2 structure of heterotrimeric Gi no matter if from the rat species.

      Following the Reviewer’s suggestion, we have dramatically reduced the results presented for the A1R-Gi recognition. We considered 1GP2 for the simulations but H5 lacks the Cterminal six residues and therefore some extent of modelling was still necessary. However, we take the Reviewer’s comment on board and consider it for future work.

      Also, the palmitoylation and geranylgeranylation process is quite tortuous and it is not clear why the NVT ensemble was employed in the second stage of equilibration. This is reflected also on the GLP1R case study.

      We have amended the text to clarify this passage. The second NVT stage is required for stabilizing the G protein and its orientation in the simulation box. The figure below shows that a plateau of the Ca RMSD during the NVT step was reached after 700 ns for both Gi (black) and Gs (orange).

      Author response image 1.

      Here, it is not clear if the RMSD of α5 of Gi was computed with or without superposition.

      The RMSD of α5  was computed after superimposing on A<sub>1</sub>R residues 40-140 Ca atoms (the less flexible region of the receptor). We have now amended the text to report this information. 

      Reviewer #3 (Recommendations For The Authors):  

      Points to address:

      (1) Root Mean Square Deviation (RMSD) data are often reported as minimum values. It would be useful to provide the average value along the stable part of the trajectories. From the plots in Figure 2ab, it seems that the minimum values reported in the paper are very far from the average ones and thus represent special cases that are seldom reached during simulation. The authors should clarify this point;

      For the revised manuscript, we moved Figure 2 to the supplementary material and added average RMSD values for the most notable replicas in Figures 4e and S8a,b. As a reference, in the text, we now report RMSDs from our previous classic MD simulations (https://doi.org/10.1038/s41467-021-27760-0) of Gs:GLP-1R cryoEM structure (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>β</sub> \= 7.22 ± 3.12 Å; G<sub>γ</sub> = 9.30 ± 3.65 Å) which show how flexible G proteins bound to GPCRs are and give better context to the RMSD values we measured during mwSuMD simulations.

      (2) The RMSD values reported in the paper always refer to single molecules or proteins. It would be useful to also report the RMSD computed over the whole complexes (ligand/GPCR or GPCR/G protein). It would provide a better metric for understanding the general distance between the results and the reference experimental structures;

      We have now removed the results sections for A<sub>1</sub>R and β<sub>2</sub> AR to focus on GLP-1R, whose RMSD is analyzed in detail in Figures 2, 3 and 4.

      (3) A number of computational works investigated the GPCR/G protein interaction and these studies should be cited and discussed. Examples are the works from Mafi et al. 2023 (doi: 10.1038/s41557-023-01238-6), Fleetwood et al. 2020 (doi: 10.1021/acs.biochem.9b00842), Calderon et al. 2023 and 2024 (doi: 10.1021/acs.jcim.3c00805 and doi: 10.1021/acs.jcim.3c01574), Maria-Solano and Choi 2023 (doi: 10.7554/eLife.90773.1), Mitrovic et al. 2023 (doi: 10.1021/acs.jpcb.3c04897), and D'Amore et al. 2023 (doi: 10.1101/2023.09.14.557711). Many of these works focused on the activation of B2AR and the interaction with its G protein. In addition, Maria-Solano and Choi 2023 and D'Amore et al. 2023 also characterized the rotation of TM6 during the A1R and A2AR activation. Therefore, the claim "To the best of our knowledge, this is the first time an MD simulation captures the TM6 rotation upon receptor activation as results reported so far are largely limited to the TM6 opening and kinking55." is untimely;

      We thank the Reviewer for the suggested references. We have added them to the introduction as examples of energy-biased (Calderon et al. 2023 and 2024, Maria-Solano and Choi, Mitrovic et al., D'Amore et al) or adaptive sampling (Fleetwood et al) approaches to GPCR. Since the above articles focus on β<sub>2</sub>  AR and A<sub>1</sub>R, we do not discuss them in detail because the results sections for A<sub>1</sub>R and b<sub>2</sub> AR have been drastically reduced in the manuscript.

      We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy. However, we have removed the claim from the text.   

      (4) In the discussion section, the authors claim that a distance-based approach can be employed when the structural data of the endpoints is limited. However, the results obtained from the distance-based protocol during the validation of the approach, which was done using V2R as a reference, are unsatisfying, as acknowledged by the authors themselves. For instance, the RMSD mode value reported for the AVP C alpha atoms with respect to 7DW9 is high, 0.7 nm, whereas the minimum value is 0.38 nm. In addition, some side chains are not oriented in the experimental conformation and might have a different interaction pattern with the receptor if compared with the experimental structure. Considering that in this case the endpoint is known, it is plausible that the performance of the method would degrade even further when data about the target structure is limited. In a real case scenario, the ligand binding mode is unknown and in such a case no RMSD matrix can be used. This represents the major concern of this study that is no prediction is provided, but only - rather inaccurate - reproduction of the known structural data;

      The goal of the first part of the work was to compare mwSuMD to SuMD to justify its application on ligand binding using a challenging case study like vasopressin. The general validation of the parent method SuMD as a predictive tool for ligand binding mode has been extensively reported over the years (a few examples: https://doi.org/10.1021/ci400766b ; https://doi.org/10.1021/acs.jcim.5b00702 ; https://doi.org/10.1038/s41598-020-77700-z) and fell beyond the scope of this work. 

      (5) In the discussion, the authors write "A complete characterization of the possible interfaces between GPCR monomers, which falls beyond the goal of the present work, should be achieved by preparing different initial unbound states characterized by divergent relative orientations between monomers to dynamically dock." It would be useful for the reader to refer to and cite here advanced computational approaches that allow a comprehensive sampling of GPCR dimerization independently from the starting conformation of the receptors. One example is coarse-grained metadynamics as shown in doi: 10.1038/s41467-023-42082-z;

      The A<sub>2A</sub/D<sub>2</sub receptors dimerization has been removed from the manuscript. 

      (6) In many cases, it is not reported how residues missing from the experimental structures used to model the proteins were reconstructed. This information is important, considering that the authors comment on the results of their calculations on addressing these regions, such as in the case of B2AR. Furthermore, the authors did not report how their initial models were validated. The authors should also explain why they did not model the IC loops of A2AR and D2R;

      In the current version of the manuscript, for V2R ECL2 and GLP-1R, we specify that we produced 10 solutions with Modeller and considered the best one in terms of the DOPE score. 

      The only receptor model used,  β<sub>2</sub> AR, is now presented as preliminary data focusing on Gs and avoiding any structural detail of the Gs recognition. 

      As reported above the A2A-D2 dimerization has been removed from the manuscript.

      (7) In several cases, the authors state that residues never investigated before play an important role in the interaction between different proteins. An example is provided on page 6 for the B2AR/G protein association. Since this claim is quite significant, it would benefit from validation, at least for further calculations such as in silico mutagenesis studies. Another example is at the end of page 10 where the authors report a hidden interaction between D344 and R385 that is pivotal for Gs coupling by GLP-1R. Is there other evidence supporting this result (previously reported literature data, conservation rate of these residues, etc.)?;

      We have removed the supplementary table reporting B2AR/G protein interactions to reduce speculations and added a reference that reports GLP-1 EC50 reduction upon mutation of position 344 to Ala (https://doi.org/10.1021/acscentsci.3c00063).

      (8) The authors should provide a deeper discussion about the conformational rearrangement of GPCR and G protein during the coupling. In detail, the conformational changes of microswitich amino acids of GPCR (e.g., PIF, NPxxY, inactivating ionic lock) and alpha helix 5 of G proteins should be discussed in relation to the literature data and experimental structures;

      We have removed the A1R and b2 AR results to focus on GLP-1R. Key structural motifs in the polar central network and TM6 kink are analyzed more in detail in Figure 3.

      (9) The chronology of the conformational changes of GLP-1R is arbitrarily chosen. During the simulation, the RMSD values reported in Fig. 3 are high and do not demonstrate the full accomplishment of the simulation of the activation process of the receptor;

      We agree with the Reviewer that the GLP-1R inactive to active transition was not fully accomplished, compared to other work on class A GPCRs.  Unlike class A, class B GPCRs represent a challenging system to work with in silico because inactive starting conformations (e.g 6LN2) are extremely distant from the active one (e.g 7LCJ, 7LCI or 6X18), as demonstrated in Figure S6 for GLP-1R. Here we report the first attempt to model a class B GPCR activation mechanism starting from the inactive state, and even if not fully achieved, we believe it represents state-of-the-art simulations for this class of receptors.

      (10) It would be helpful for the reader not familiar with the employed technique that the authors explain in one sentence in the main text the pros and cons of using multiple walkers instead of single walker SuMD;

      We thank the Reviewer for the excellent suggestion. In the Discussion, we have now commented that: “more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch”, while in the Methods we explain that “mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, mwSuMD requires multiple GPUs to be the most effective, although any multi-threaded GPU can run more walkers on the same hardware keeping the sampling variety.”.

      Minor points to address:

      (11) Page 3: the following sentence is duplicated (also found on page 2) "GPCRs preferentially couple to very few G proteins out of 23 possible counterparts";

      (12) Page 20: Figure S13 refers to the QM validation of PF06882961 torsional angle, not to the image of the receptor conformational changes, which is instead Figure S14 (please correct figure caption).

      We thank the Reviewer for the accurate reading of the manuscript. These typos have been corrected.

    1. eLife Assessment

      This study describes a novel mechanism for how collagen fibrils are formed. The authors present compelling evidence that collagen-I fibrillogenesis relies on a functional endocytic system for recycling collagen-I, with circadian-regulated VPS33b and integrin-α11 being critical for fibril assembly. This is an important study for the understanding of the pathophysiology of collagen fibrillogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI along when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans.

      Strengths:

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative.

      Weaknesses:

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis. It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI. The circadian regulation does not appear as robust as the authors last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils. The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39. There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text. Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

    3. Reviewer #3 (Public review):

      Summary:

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). The authors also demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. Finally, the authors performed knockdown assays in patient derived IPF fibroblasts to confirm that silencing of VPS33b and ITGA11 results in a decrease in recycling of exogenous collagen-1

      Strengths:

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process, and that this endocytic recycling becomes disrupted in fibrotic diseases.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Overall authors’ response

      We would like to thank the 3 reviewers for a thorough critique of our manuscript, and acknowledging the novelty and importance of our studies, in particular the relevance to collagenrelated pathologies such as idiopathic pulmonary fibrosis and chronic skin wound. We appreciate that there are shortcomings in these studies, as highlighted by reviewers; we have rewritten parts of our manuscript to clarify any misunderstandings, and conducted additional experiments to address concerns raised by reviewers (please see below red text within each response), which have been incorporated into our revised manuscript (modified text highlighted in yellow in revised manuscript). We believe that the revision had made our manuscript stronger in support of our original conclusions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B, and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans. 

      Strengths: 

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative. 

      Weaknesses: 

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis.

      We thank the reviewer for pointing this out. Macropinocytosis or phagocytosis could be modelled using high molecular weight dextran, and we have used fluorescently-labelled dextran to investigate potential co-localisation with exogenous collagen to investigate the involvement of these mechanisms in addition to endocytosis, and showed very little co-localisation (revised Figure S2B, lines 123-126). Further, we have performed a competition experiment where unlabelled collagen was added in excess at the same time as labelled collagen and showed that excess unlabelled collagen led to a retention of labelled collagen at the cell periphery (revised Figure S2C, lines 126-129). This is suggestive of collagen-I uptake utilises a different pathway to dextran (i.e. fluid-phase endocytosis) and is a receptor-mediated process.  

      It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI.

      We agree with the reviewer that the intracellular destination of ColI is very interesting, which is what the current Chang lab is investigating, although we believe the research findings fall out of scope for the revised manuscript here. However, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments using GFP-tagged Rab5 constructs (revised Figure 1D, Figure S6A).

      The circadian regulation does not appear as robust as the authors' last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils.

      The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B, and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39.

      We would like to clarify that we meant the post-Golgi compartment. We did not mean VPS33b/VIPAS39 as an endosome marker; however as we see collagen entering the cell in intracellular compartments, which is then recycled, we take that as convention, the endosome would be involved. This is further supported that we see some colocalisation with the classic Rab5 endosome marker.

      There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text.

      We appreciate the comment and have modified overstatements in the revised manuscript as appropriate. As stated above, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments.

      Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

      We appreciate the concern raised here. This is precisely why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen which is not endocytosed being incorporated onto pre-existing fibrils. We have new data using flow imaging, which showed that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a more detailed methodology-based study which is under preparation, which will allow future studies to further dissect the collagen intracellular trafficking process, and thus is not included in the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors describe a mechanism, by which fluorescently-labelled Collagen type

      I is taken up by cells via endocytosis and then incorporated into newly synthesized fibers via an ITGA11 and VPS33B-dependent mechanism. The authors claim the existence of this collagen recycling mechanism and link it to fibrotic diseases such as IPF and chronic wounds. 

      Strengths: 

      he manuscript is well-written, and experimentally contains a broad variation of assays to support their conclusions. Also, the authors added data of IPF patient-derived fibroblasts, patient-derived lung samples, and patient-derived samples of chronic wounds that highlight a potential in vivo disease correlation of their findings. 

      The authors were also analyzing the membrane topology of VPS33B and could unravel a likely 'hairpin' like conformation in the ER membrane. 

      Weaknesses: 

      Experimental evidence is missing that supports the non-degradative endocytosis of the labeled collagen.

      We thank the reviewer for raising this. We would like to clarify that we do not think that all endocytosed collagen-I is recycled, but rather sorted in the endosome which determines the fate of endocytosed collagen. Interestingly, results from Kadler’s group has shown that blocking lysosome function (through chloroqine and bafilomycin) significantly reduced endogenous collagen fibril formation (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), suggesting a nondegradative role for lysosome in fibrillogenesis.   

      The authors show and mention in the text that the endocytosis inhibitor Dyngo®4a shows an effect on collagen secretion. It is not clear to me how specific this readout is if the inhibitor affects more than endocytosis. This issue was unfortunately not further discussed.

      We thank the reviewer for this comment and have included in discussion the specificity of Dyngo4a (revised manuscript lines 383392). The ponceau stain suggests that Dyngo4a treatment did not affect global secretion and thus the effects are specific to collagen-I (Fig 2B).

      The authors use commercial rat tail collagen, it is unclear to me which state the collagen is in when it's endocytosed. Is it fully assembled as collagen fiber or are those single heterotrimers or homotrimers?

      We apologise for the confusion and will clarify in our revision. These would be single helical trimers from acid-extracted rat tail collagen. We have performed additional light scattering and CD spectra to confirm the molecular weight and helicity, and confirm that adding fluorescent tags did not alter the readout. We have included this in the revised manuscript (revised Figure S1A-C, manuscript lines 82-86).    

      The Cy-labeled collagen is clearly incorporated into new fibers, but I'm not sure whether the collagen is needed to be endocytosed to be incorporated into the fibers or if that is happening in the extracellular space mediated by the cells.

      We appreciate the concern raised here, which is also raised by reviewer 1. As answered above, this is why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen being incorporated onto pre-existing fibrils. We also have new data using flow imaging, which shows that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a methodology-based manuscript which is under preparation, thus will not be included in the revised manuscript.  

      In general for the collagen blots, due to the lack of molecular weight markers, what chain/form of collagen type I are you showing here?

      Apologies for the lack of molecular weight markers, it was an oversight by the authors and have been included in the revised figures.  

      Besides the VPS33B siRNA transfected cells the authors also use CRISPR/Cas9-generated KO. The KO cells do not seem to be a clean system, as there is still a lot of mRNA produced. Were the clones sequenced to verify the KO on a genomic level?

      Yes, the clones were verified and used in our previous paper on circadian control of collagen homeostasis. There are instances where despite knockout at the protein level, mRNA is still persistent; however these transcripts are likely then directed to degradation through nonsense-mediated mRNA decay. To fully understand this mechanism is beyond the scope of this paper. 

      For the siRNA transfection, a control blot for efficiency would be great to estimate the effect size. To me it is not clear where the endocytosed collagen and VPS33B eventually meet in the cells and whether they interact. Or is ITGA11 required to mediate this process, in case VPS33B is not reaching the lumen?

      This is an interesting question. We have conducted experiments with Col1-GFP11 containing conditioned media incubated with VPS33b-barrell in the revised paper, which showed that they interact within the cell and not at the cell periphery (revised Figure 6G, lines 293-296), again highlighting that VPS33b is not involved in the endocytosis step but interacts with endocytosed collagen-I intracellularly. We have attempted colocliasation studies using the split GFP approach with VPS33B and ITGA11 to investigate where they interact, but as the ITGA11 construct we used did not localise to the cell surface as expected, we are not confident that this system is appropriate for investigating how/if VPS33B interacts with ITGA11, and there are simply no good antibody for VPS33B for staining. 

      The authors show an upregulation of ITGA11 and VPS33B in IPF patients-derived fibroblasts, which can be correlated to an increased level of ColI uptake, however, it is not clear whether this increased uptake in those cells is due to the elevated levels of VPS33B and/or ITGA11.

      We would like to clarify here that we do not think collagen-I uptake is due to VPS33B and/or ITGA11, as siITGA11 and VPS33B in fibroblasts showed no consistent changes in uptake as determined by flow cytometry, which was included in the original manuscript (now revised Figure 6H, 7I). VPS33B and ITGA11 are involved in the ‘outward’ arm of recycled collagen-I, i.e. directing to fibrillogenesis route. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript, and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Reviewer #3 (Public Review): 

      Summary: 

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). Finally, the authors demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. 

      Strengths: 

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process. 

      Weaknesses: 

      Throughout the study, several different cell types have been used (immortalised tail tendon fibroblasts, NIHT3T cells, and HEK293T cells). In general, it is not clear which cells have been used for a particular experiment, and the rationale for using these different cell types is not explained. In addition, some experimental details are missing from the methods.

      We thank the reviewer for pointing out the lack of clarity, and have filled in missing information in the methods. HEK293T cells were used for virus production for the VPSoe system, and we have clarified the cell types used in figure legends (predominantly iTTF). We have also provided justification when NIH3T3 cells were used (revised lines 290-291).    

      There is also a lack of functional studies in patient-derived IPF fibroblasts which means the link between endocytic recycling of collagen and the role of VPS33b and ITGA11 cannot be fully established.

      We thank the reviewer for this comment, which was also raised by reviewer 2 above. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The authors inhibit Clathrin-dependent endocytosis with dyngo4a. It is well known that this inhibitor is not highly specific for this pathway. It is also not explained why the authors only inhibit the Clathrin uptake pathway, and not pinocytosis or Clathrin-independent endocytosis too. The authors refer to papers that describe pinocytosis for collagen endocytosis.

      We thank the reviewer for raising this question. Based on the fact that inhibition of clathrin-dependent pathway does not completely abrogate endocytosis of collagen-I, we anticipate that other pathways are involved in mediating collagen-I uptake, although additional data suggested this is unlikely through fluid-phase endocytosis, and is receptor mediated (revised Figure S2B, C).  

      Where does the ColI go in the cell? Depending on the uptake pathway, it is likely to pass through endocytic carriers to endosomes, where it may be recycled to the PM or degraded. From the start, the authors describe the ColI as being in vesicular structures, however, the imaging data that this is based on is not co-labelled with anything to determine the potential structure/localisation. This is not done at any point in the paper, until IF is shown of ColI with VIPA39, however without the relevant controls, this IF is unconvincing, as the general pattern of ColI and VIPA39 as an endosomal marker are not classically recognisable. Additionally, VPS33B is described as a late endosome/lysosome marker, which would have different connotations on ColI trafficking or destination than other types of endosomes.

      We thank the reviewer for pointing out the weaknesses in our original IF. We have included new confocal images showing labelled collagen co-localisation with GFP-tagged Rab5 through transient transfection, which is a more traditional endosome marker (revised Figure 1D, Figure S6A).  

      We are currently characterising the compartments to where ColI is trafficked to, which is being prepared as part of a methodology-based manuscript. We believe that this characterisation would be too detailed to be included in a revised version of this manuscript. The Kadler lab also have data suggesting that the lysosome is involved in collagen fibrillogenesis instead of its canonical degradation function, which is in another submitted manuscript (https://www.researchsquare.com/article/rs-1336021/v1). It was not included in this manuscript due to our focus (i.e. endocytic-recycling).   

      In Figure 5H, the pattern of Cy5-ColI staining looks like it could even be ER/Golgi in the VPSKO zoom panel, but in the absence of co-labelling, we cannot conclude anything. In order for the authors to conclude that ColI is within the endosomes, co-labelled If should be performed to demonstrate ColIendosomal colocalization. Likewise for the role of VPS33B in ColI fibrillogenesis: dependence of the process is demonstrated, but the relationship is not defined. This could be clarified using IF. This would also support the authors' statements of co-trafficking between ColI, VPS33B, and VIPA39, which as the paper stands, is not demonstrated.

      We would like to clarify that our hypothesis is that the endosome controls how collagen is being deposited outside the cell, i.e. whether it’s protomeric secretion or fibrillogenesis, and that the decision of whether an endocytosed collagen is recycled or degraded lies in this compartment. The reviewer is correct that it may not be just the endosome that endocytosed collagen-I ends up in, as we have new data suggesting involvement of other intracellular compartment, although the detailed mechanism is beyond the scope of this manuscript. Nonetheless, we have included new data showing co-localisation of endocytosed collagen with Rab5 in this revised manuscript (revised Figure 1D, Figure S6A).  

      The basis of this paper is that endocytosis of ColI must occur before re-exocytosis as fibrillar ColI. The authors show this through pulse-chase experiments, with a trypsinisation step to remove any externally bound ColI. The authors also show nice time progression by flow cytometry, but it would truly demonstrate this point if they showed 0 timepoint, or low timepoint of IF to show progressive lengthening of ColI fibrils. This is used early on in Figure 1D, although the presentation here is not very clear. This is especially important as the authors address the self-seeding capabilities of Collagen in cell-free conditions in Figure 1F.

      We would like to thank the reviewer for this suggestion.  From previous endogenously tagged collagen data, we know that the appearance of collagen fibrils is rather rapid, thus it may not be a gradual lengthening as expected, but rather a depletion of endocytosed collagen in the initial seeding/growth step (please see https://www.researchsquare.com/article/rs-1336021/v1). We have included an image of replated fibroblasts after 18 hours showing no appearance of extracellular collagen, endogenous or otherwise (revised Figures S2A, line 110).  

      Finally, although the involvement of ITGA11 is interesting, it is not well described, and its role is not well demonstrated. This could likely be clarified by an additional introduction to ITGA11 and its role in collagen exocytosis/fibrillogenesis.

      We would like to thank the reviewer for pointing this out and have included additional sentences to specifically introduce ITGA11 and its role in fibrillogenesis (see lines 320, 321; 446-450).  

      Specific points: 

      Line 73: You haven't compared reuse vs production, so you can't say that reuse is central rather than production. They may be both as important or production still may be the most crucial, maybe it depends on cell/collagen type. Using the ColI KD or CHX to block nascent synthesis, you could directly compare the impact of both.

      We would like to clarify that we are not referring to reuse/recycling here. We meant that production of collagen (i.e. single hetero/homotrimer molecules within the cell) is not as crucial as the utilisation (i.e. are these being secreted as protomers, or assembled into fibrils) of these building blocks by the cells, which was supported by our finding that production (as suggested by mRNA levels) of IPF fibroblasts are similar to that in control fibroblasts (now revised Figure 8A). We have conducted ColI siRNA to block nascent synthesis in the original manuscript and showed that fibroblasts can efficiently make new fibrils by recycling exogenous collagen (Figure 3B, C), although we appreciate that siRNA may not completely inhibit endogenous production. Thus, we have also included new data using collagen-I knockout cells to support our hypothesis that without endogenous production, fibroblasts can still effectively make collagen fibrils if they can reuse what is available in the extracellular space (revised Figure 4, Figure S3C, D; lines 178-199).  

      Lines 83-87: The rationale for this experiment is not clear. Cy3-ColI is added, taken up into cells, and incorporated into fibrils coming from cells. 5FAM-ColI is added at a later stage, then at 2 days (when incorporation is demonstrated in Fig 1B), it is also incorporated into cells as expected. Why does this comment on ColI not being degraded any more than Cy3-ColI alone?

      We believe that the pulse chase experiment using the differently tagged collagen demonstrated a dimension of dynamics that is not demonstrated with Cy3-ColI alone. In this case, Cy3-ColI was initially added, and removed after 3 days; 5FAM-ColI is then added and incubated for 2 more days. Thus after 5 days since the initial pulse, the Cy3-ColI persisted and was not degraded. We would like to apologise for causing this confusion, and have clarified in the revised manuscript (lines 542-549; Figure S1D figure legend).  

      Figure 1A: I would like to see a negative control: either dark colI or no Cy3-Col, or timescale. Is B quantified from these images?

      We thank the reviewer for this comment. We have added the nocollagen control image in our revision (revised Figure S1D). 1B is not quantified from the ex vivo tendon experiments, but rather the in vitro cell culture experiments (i.e. those from 1D-1F, although they are all from independent experiments).  

      Figure 1B: in iTTF cells (immortalised tendon cells) Corrected to max: What does that mean?

      As there are variations between individual experiments (e.g. changes in the amount of collagen added due to pipetting) we have normalised to the maximum value obtained in each individual experiments so that we can display all biological repeats within the same graph.  

      Figure 1C: You can't say ColI is in vesicular structures from this, they are spots, yes, but that could also be in Golgi/ER (unlikely to be cytosolic but not impossible).

      We appreciate this comment and have change the wording accordingly and call them intracellular/punctate structures.

      Figure 1D: Not the best presentation: The cell mask has structures: what are these? It's not clear if this is a single cell, would be better with a defined marker (endocytic marker, lysosome etc). Instead of a low-resolution 3D view, it would be clearer with normal confocal XY and zooms of "vesicular structures" using appropriate markers as 3D reconstructions I think it could be removed.

      This is a single cell and the cell mask is staining plasma membrane. We didn’t use defined marker as we wanted to visualise the whole intracellular cell compartment. We appreciate that further proof is needed to verify the location of the endocytosed collagen, and have included additional confocal imaging data to support the localisation of collagen into Rab5 positive intracellular compartments (revised Figure 1D, Figure S6B).  

      Figure 1 E/F: Cy3 is only visible in extracellular structure, not also intracellular. Why? Would be useful to see the time points of incorporation at the end of the pulse, then at an early point into the chase, to demonstrate 1) Cy3-ColI uptake into cells and progressive incorporation rather than potential direct binding of ColI-Cy3 to ECM, or other non-specific factors. Showing the image at 0t would demonstrate an absence of external labelled colI and therefore its appearance later could be presumed that it had been internalised before.

      As the cells were trypsinized and replated after one hour labelled collagen feeding to ensure we are only tracking endocytosed collagen, t=0 in this case would be cells that are unattached. We have included t=18hr images post replate instead to show baseline level of collagen (revised Figures S2A, line 110).

      Figure S1A: yellow box: doesn't show only Cy3-ColI, there is red and yellow in the central cell, and large yellow blobs in the cell above. These images do not support this claim, including the Fiber Zoom box. They should also be shown in single channels to demonstrate the authors' points better.

      Apologies for the confusion – this is to show that newly added FAM5 Collagen is also co-localising with previously endocytosed Cy3-ColI, i.e. the Cy3-ColI is persisting rather than being degraded.  

      Line 92: endocytosed into distinct structures: These images are very vague, but I don't think you can call them distinct structures, all you can say from this is that they are spots.

      We have changed the wording to ‘distinct puncta’.  

      It is not clear why the authors use Cy3, Cy5, and 5FAM labelled colI. A brief explanation would be useful.

      Apologies for the confusion, we initially included our justification (to show that the fluorescence labels do not change the way collagen is internalised) but removed it in the final manuscript due to length. We have added the justification (revised line 101-102).   

      Figure 1F: It would be useful to see a quantification of the Cy3 channel here: I agree with the conclusions, and find the 0.5 ug/ml condition more convincing than 0.1 actually, although there is some feint Cy3 in cell-free samples there seems to be quite a big increase in the presence of cells, and this would look more convincing if quantified.

      We thank the reviewer for this suggestion and have included quantification in the revised manuscript (revised Figure 1G-I).  

      Figure 2B: Dyng is not an abbreviation of Dyng. Standardise Dyng/Dyngo/Dyngo4a. WB is soluble colI and represents little (if any) insoluble col. IF is more or less the other way round. How do they compare this?

      Thank you for pointing out the inconsistencies, we have corrected this in the revised manuscript. We took the conditioned media from the same experiment where cells are fixed for IF and carried out Western blot analyses. The IF showed some collagen still present, albeit significantly reduced. This is in agreement with the western blot results (i.e. Dyng4a inhibits both soluble and insoluble forms of collagen deposition).  

      Figure 2C: not an image series. Quant: no cells/independent exps and STATS?

      Apologies for the missing experimental details in figure legends, it should say ‘representative of N=3 experiments’. We are not sure what the reviewer meant by Figure 2C not being an image series, as we meant it to be an image series of the individual fluorescence channels. We have changed this terminology to avoid confusion, and have included statistical analyses in the methods section. The statistical analyses of the fibril quantification is next to the fluorescence images.  

      Figures 2D/E: The authors show that internalised ColI peaks at 20h and decreases to 60h, Fibers peak at 40h. How is this measured? ECM removed? Why would there be less in the cells, degradation? Whats the synchronisation?

      We apologise for omitting the synchronisation method in methods section, and have included in our revised manuscript (revised lines 542-544). This is through dexamethasone addition (and removal after 1hr incubation) as standard. The internalised Col-I is measured using Cy3ColI so the cells would have both nascent and external collagen. Total intracellular collagen at the different time points would likely be higher than represented as a result, but here we are demonstrating that internalisation is a rhythmic event using the external labelled collagen. Fibers are measured using standard IF and then fibril counting.  

      Please note that we are only overlaying the two graphs to form our hypothesis that endocytosis may be used for accumulation of collagen protomers that then allows for efficient fibrillogenesis. They are not directly comparable as the quantification are of different things (internalised Cy3-ColI, total collagen fibrils). We have clarified this in our discussion (revised lines 399-401).  

      Discussion: Where does the ColI go? Solubilised? Degraded? Taken up by other cells? 

      The inverse correlation is not very tight. In fact, at 38h where fiber count peaks, Cy3-ColI also peaks (esp in normalised data, Figure S2D).

      We thank the reviewer for this comment and have reworded our main text to reflect this, and included additional discussion in our revised manuscript (revised lines 401-404).  

      Line 123: What is the turnover rate of Fibrils? Don't know for how long the transcription has been done, or when this would affect the fibril number. You have the quant for Fn1, where is the quant for ColI?

      We have included the quantification of collagen-I in original Figure 2A. We appreciate that it might cause confusion in Figure 2C (as we co-stained ColI and Fn1 in the same experiment) we have removed the collagen-I panel from the revised Figure 2C. We know from previous results that the number of fibrils fluctuate over 24hour period, although the turnover of one specific fibril is unlikely going to be 24 hours (https://www.biorxiv.org/content/10.1101/331496v2)

      Line 124: no accumulation of col in extracellular space, but you don't know how much endogenous colI (or other endogenous ECM proteins) they're taking up as it isn't measured here. If the author wants to comment on this, should use either exogenous col to monitor take up and resection or block transcription/translation to show fibril formation endo/exocytosis independent of endogenous synthesis.

      This experiment has been done in the original manuscript – siCol1a1 experiment was done with two rounds of siRNA, first round is normal transfection followed by reverse transfection onto fresh coverslips (this will ensure no prior ECM is being deposited, see Figure 3). However we appreciate that there may still be low levels of endogenous collagen-I, and thus have included new data using collagen-I knock-out fibroblasts to strengthen our findings (revised Figure 4).  

      Line 142: Why is fibronectin synthesis also decreased in Col KD? This is clear in the image but no explanation/reference is given.

      Due to the dynamic and complex nature of ECM, it is unsurprising if there is a knockon effect when knocking down one matrix protein. However, we have quantified the amount of fibronectin fibril deposited by scr and siCol1a1 fibroblasts, and showed that there was in fact no significant change between the two treatments (revised Figure 3A).

      Figure 3A: Need labels for which colour/protein is shown. Needs quantifying, especially as the Fn1 decrease is not so obvious here, it is consistent between Figure 3A and 2C?

      We have provided quantification in the revision (revised Figure 3A). Figure 3A and 2C are two separate experiments (one is Dyngo treatment and one is siCol1a1), and neither showed significant changes in fibronectin fibril areas.   

      Figure 3B: Line 151: the text states that "The observation of fibrillar Cy3 signals in siCol1a1 cells showed that the cells can repurpose collagen into fibrils without the requirement for intrinsic collagen-I production (red arrow Figure 3B), however, there is clearly endogenous colI here too (along the fiber and also strongly at each end). Does the ColI antibody recognise the exogenous ColI?

      In our hands the ColI antibody does not recognise exogenous ColI, as the cell-free Cy3-ColI images were also stained with ColI antibody to ensure the two experimental conditions were treated exactly the same.

      This conclusion could only be made in the true absence of collagen: either in knock-out cells, or where collagen production/trafficking has been blocked (ie knockout of ColI chaperone or ERES block), or in a cell type that produces collagens but not ColI. Alternatively, if there are any fibrils seen that are completely negative, they should be shown in the figure and quantified (number of Cy3-ColI+-ColI+ vs Cy3-ColI+-ColI-).

      We thank the reviewer for this suggestion. We have included new data from collagen knock-out fibroblasts in this revision (revised Figure 4).  

      Figure S4A: the quality of this blot isn't very high, the result is not very clear and the high intensity (unspecific?) band below confounds the interpretation. In the author's previous paper (NCB 2020) the blots for VPS33B were much clearer, as is Fig S4D. It would be nice to include a clearer blot, maybe from the other repeats.

      This is the only blot that we used to select which knockout clones to use for our previous paper, which is why the quality is not as high. Knockout clones were all verified with additional western blots, and we do not think that endogenous VPS33b is expressed at high levels (also verified by MS analyses).  Fig S4D is overexpression of VPS33b, which is much easier to detect.  

      Figure S4D: This blot is much clearer, it would be useful to include a high gain to show the VPS33B band in CT to be able to understand the true increase.

      From the qPCR data one can see that the increase at mRNA is 20+ fold increase; we’ve always had problems trying to detect endogenous VPS33b using western blot or mass spectrometry analysis.  

      Figure 4A: The fibrils here in the CT are not obvious, and the difference between CT and KOs is not appreciable. Would this be clearer shown at a lower magnification, with zooms where needed? Or immunogold labelling/CLEM to label the ColI?

      It is not trivial to carry out immunogold labelling/CLEM. These are cell-derived matrices in culture and thus lower magnification may not show as many collagen fibrils as one would expect. We are not confident that lower magnification will provide more information as the characteristic D-banded collagen pattern will be lost.  

      Line 167/Figure 4B: It looks like there is more internal ColI in KO, but the images are not good enough to tell. This could be better shown by flow cytometry.

      We have previously seen that VPSKO leads to accumulation of collagen-I in intracellular punctas (NCB2020) which is also seen here. Flow cytometry data for internalisation of external collagen is already included in original Figure 5G (revised Figure 6H).  

      Again you mention intercellular vesicles, but based on these images, it is not possible to conclude this. These large spots could be aggregation elsewhere in the cell. Specific localisation should be shown by co-labelled IF/confocal, or it could be nicely shown by EM + fluorescent element (CLEM / Immunogold), or these statements removed from the text.

      We appreciate that the term ‘vesicles’ is very defined in the trafficking field, and have changed it to ‘intracellular compartments’.  

      Line 173-174 / Figure 4E: Why do you think the matrix mass is not increased in VPSoe by the approach shown in E when there is seemingly a huge increase by IF? E must also measure other ECM matrix proteins, which do you expect to be secreted by these cells? Could this confound the data if they too are affected by VPSoe?

      IF is showing specifically collagen-I. Hydroxyproline detects multiple collagens, and shows a trend of increase (although not significant due to one outlier). Matrix mass is a very generic measurement of total ECM deposited based on decellularized ECM weight. The reviewer is correct that VPSoe may also affect other ECM deposition, however here we are focussing specifically with its effect on collagen-I. How VPSoe changes other types of ECM deposition would be something that could be addressed in future studies and is not within scope of this manuscript.   

      Are the results in E paired?

      Individual values between control and VPSoe in each separate experiments are paired.  

      Figure 4F: Is quantification from IF shown in D? Specify which kind of microscopy it is based on.

      Quantification is based on fibril counting using standard fluorescence microscopy, as used in our previous paper. D is independent of F, as F is specifically looking at synchronised circadian effects, and D (and elsewhere) we are looking at global collagen deposition effects, irrespective of what time of day the cells are in.  

      Figure S5F: What do the yellow/red spots in the blots represent?

      We apologise for the initial unclear description of what the yellow/magenta circles depict in relation to the phosphoimages of the radiolabelled cell free translation products displayed in Supplementary Figure 5, panels F, G and I. These circles indicate non-glycosylated (yellow) and N-glycosylated (magenta) species respectively, as is now clearly descried in the revised manuscript.

      Figure 5 title: You can't conclude this from these images, need confocal and PM or cytosolic marker.

      We have changed the title to ‘VPS33B co-trafficks with collagen-I”. There is no good commercial VPS33b antibody for immunofluorescence staining, which is why we used the split GFP approach in this paper, and the images were acquired using confocal imaging (Olympus SpinSR system).  

      Figure 5E: The authors describe that ColI is in endosomes throughout most of the paper, and this is based on the involvement of VPS33B in the colI pathway. VPS33B is thought to be at the late endosome/lysosome. However, these images do not look like classic endosomes or lysosomes, or other normal organelle IF phenotypes. The fluorescent intensity looks saturated, and it is difficult to conclude anything from these images. It is unclear where in the cell the largest blob in the zoom would be localised and in which cell. I would suggest that this image is replaced and proper controls included (IgG controls and single channels) as well as using different markers for other potential intracellular structures.

      We appreciate the reviewers comment with regards to the classification of VPS33b localisation in the endosome compartment. We did not mean to use VPS33b as an endosome marker, as the focus of our studies are the function of VPS33b in directing endogenous or exogenous collagen to fibrillogenesis. With live imaging we could see endocytosed collagen moving in intracellular compartments, and have conducted additional staining to show co-localisation with Rab5 (revised Figure 1), which we take to indicate, through convention, that it is occupying an endosome compartment. We have included single channel images in the revised manuscript (revised Figure 6E).

      Line 255/ Figure 5G: no consistent change in uptake. Why are the results so varied in the KO and oe, here and in Fig 4C/E? N=4, what does that mean? 4 cells? 4 independent exps?

      In all cases, “N” represents independent biological experiments in this manuscript. Thus “N=4” in this case is 4 independent biological experiments, with at least 10,000 cells analysed per experiment. 

      We don’t know why there is a variation in response, however that is also why we concluded that it is unlikely that VPS33B is directly involved with collagen uptake. We have changed 5G (now revised Figure 5H) to a paired line graph for better representation.  

      Figure 5H shows the uptake of Cy5ColI. At this resolution, VP2ko looks like the col is ER, in one of the cells in the zoom, it looks like it is at Golgi. I think that the uptake route of ColI needs to be better defined, as there is no way to tell here where the colI goes. ColI being recycled/degraded would be most likely. But this figure looks like that might not be the case. It is also not clear where the zooms come from, they should be indicated with dashed boxes in the lower mag image

      We thank the reviewer for this comment, and agree that we need to define the uptake route of ColI. This is currently being assembled as a methodology manuscript, and how ColI is being recycled/degraded is one major research area of the Chang lab. 

      We have added dashed boxes in the lower mag images to indicate where the zooms derived from, and we would also like to thank the reviewer for pointing this out as we realised we have accidentally cropped the image to a slightly different area for the VPSko image, and have now corrected this.  

      Line 257: Based on this data, it could be trafficking through the cell as well as into the extracellular space.

      We think that VPS33B is involved in trafficking collagen through the cell to plasma membrane but not secreted, as based on our split-GFP experiment we never observed extracellular GFP signal, which suggests VPS33b is not deposited extracellularly.

      Line 259: "highlighting the role in recycling col to fibril formation sites" is an overstatement based on the data shown here, there is no data on colI trafficking or its regulation

      We respectfully disagree that we have not shown data on col-I trafficking or regulation by VPS33b – split GFP highlighted cotrafficking to the plasma membrane, and we have shown a clear relationship between VPS33b and collagen-I fibril formation, with minimal changes to collagen-I mRNA levels. We acknowledge that we have not shown specifically the location of VPS33b at fibrillogenic sites and have modified this statement in revised manuscript (revised line 302).  

      Line 262: "Having identified VPS33B as specifically driving collagen-I fibril formation" is also an overstatement.

      We refer here the data that VPS33b is not controlling collagen-I secretion (as demonstrated by the CM westerns) and specifically fibrillogenesis. We have clarified this in the revised text (revised line 304).  

      Line 286: It would be useful to have a brief intro to PLOD3.

      We have included a brief intro to PLOD3 in the introduction, as well as the results highlighted by the reviewer, in our revised manuscript (revised line 54-58).  

      Line 289/290: There could be other explanations for disruption to exo-endocytosis when disrupting col trafficking. Is VPS33B controlling exocytosis in general? Why should it be specific to col? Likewise with siITGA11 KD? Hypothesis for ITGA11 and fibrillogenesis?

      The relationship between ITGA11 and collagen fibrillogenesis is currently in a manuscript by Donald Gullberg and Cedric Zeltz, under revision at Matrix Biology (see reference 63 in revised manuscript). We do not think that VPS33b is controlling exocytosis in general, which is supported by the minimal change in ponceau stain of the western blots in the manuscript. Previously it has been shown that VPS33B co-trafficks with PLOD3, a collagen-I modifier.  

      Figure 6I: Why only quant Scr + siITGA11, not in VPSoe? It looks like there is still an increase in intracellular or fibril formation in VPSoe + siITGA11, which would be a key result to discuss.

      We would like to clarify that 6I (now revised Figure 7I) is on the endocytosis of exogenous collagen-I, not quantification of Figure 6H.  

      Line 307: Discuss fibrillogenic sites, what are they?

      As we have not shown direct evidence of VPS33B delivering endocytosed collagen at the site of fibrillogenesis, we have decided to alter the text to avoid overstatement, as suggested from previous reviewers’ comments.  

      Figure 8: What does pentachrome label?

      Pentachrome staining allows for simultaneous staining of multiple species: collagen in red, sulphated mucopolysaccharides in violet, red blood cells in yellow, muscle in orange, nuclei in green.

      Line 326: "In this study we have identified the endosome as a major protagonist in..." This is an overstatement and cant be drawn from this data.

      We have modified this statement to “In this study we have identified an endocytic recycling mechanism for type I collagen fibrillogenesis that is under circadian regulation”

      Line 330/331: "Collagen-I co-traffics with VPS33B in a VIPAS-containing endosomal compartment that directs collagen-I to sites of fibril assembly," This is also an overstatement that cannot be drawn from this data.

      We have modified this statement to “Collagen-I co-traffics with VPS33B to the plasma membrane for fibrillogenesis”.  

      Line 340: again, the demonstration of the involvement of the endocytic pathway is very limited.

      We have provided new evidence in the revised manuscript that support the involvement of classical endosomal compartments.  

      Line 366: You cant conclude this, you have not manipulated these proteins to show a functional effect or modulation of fibrillogenesis, it could still be a secondary effect.

      We have provided new evidence in the revised manuscript that supports this conclusion. 

      Line 569: "Unless otherwise stated, incubation and washes were done at room temperature." Which incubations? Specify if this is just post-fixation during the EM prep or during cell culture.

      This is specific to the EM preparation and we have clarified in the revised manuscript (revised line 663).  

      Small text alterations:

      Overall we would like to thank the reviewer for highlighting these errors and mistakes in our manuscript, and have corrected them in our revised manuscript.  

      Figure 1E: Fluoro image series? This is only one image.

      We wrote this to mean single channel images, we have corrected the terminology.  

      Line 111: Ref for Dyngo4a?

      We have included this in the revised manuscript  

      Line 121: introduction/abbreviation definition for Fn1? Instead it is on Line 140.

      Thank you for highlighting this, we have corrected this in revised manuscript.  

      Figure S2C: Alignment of labels cleaves x-axis.

      We thank the reviewer for catching this and have corrected this with our revised manuscript.  

      Figure S4F and G should be inverted to mention sequentially in the text.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.  

      Line 182: Figure 4J should be G.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.

      Line 209: typo: N-glycosylated.

      We have corrected this typo in our revised manuscript.

      Fig 6E: Very big as a figure element compared to others.

      We have made this smaller in the revised manuscript to fit better with rest of the figure.  

      Line 313: Figure 7E not F.

      Thank you for spotting this, we have corrected it.  

      Line 555: Typo: Scraped.

      We have corrected this typo in our revised manuscript.

      Line 562: missing )

      We have corrected this typo in our revised manuscript.

      Standardise

      We thank the reviewer for spotting the mistakes below and have corrected in our revised manuscript.  

      Legends: Include numbers of repeats and STATs throughout. 

      Terminology: Dyng etc. 

      Scale bars: some included as editable lines, some with size on top, small/large etc.

      In certain cases we have positioned the scale bars in different regions of the figures to ensure no obscuring of the images.

      VPS33b v B. 

      Reviewer #2 (Recommendations For The Authors):  

      The authors can improve the experimental part of the manuscript the following: 

      -  For all the western blots please include molecular weight markers.

      We thank the reviewer for noticing this omission and have included molecular weight markers in the revised manuscript.  

      - Performing immunofluorescence and western blot analysis of endocytosed collagen -/+ inhibitors for lysosomal degradation (BafA1 or E64d+PepstatinA) in order to exclude endocytosis for degradation.

      We thank the reviewer for this comment, another paper from the lab has identified lysosome to be involved in collagen fibrillogenesis (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), thus  

      - Figure out how Dyngo4a is affecting Col1 secretion in the first place? Does it interfere with the secretory pathway. Alternatively, use a different model to block endocytosis (e.g. siRNA Dynamin).

      We thank the reviewer for raising this. The Dyngo CM blot for total ponceau stain (revised Figure 2B) showed minimal changes, which suggest that global secretion is not affected.  

      - Further characterization of the VPS33B / collagen vesicles by immunofluorescence containing markers for early, late, and recycling endosomes. Block endocytic recycling by depletion of either Rabs or e.g. EHD1.

      There are no good VPS33b antibody for staining. We have included images of GFP-tagged Rab5 co-localisation with labelled collagen-I (revised Figure 1D, Figure S6B).  

      - Further clarify the status of the VPS33B knockouts e.g. by sequencing. also provide a readout of the siRNA KD, besides the mRNA levels, since there the difference is not striking.

      The knockout cell lines were characterised previously in our 2020 paper, which is referred to in our revised manuscript. We have always had issues detecting endogenous VPS33b due to reagents limitations, which is why we resorted to mRNA as the key readout.  

      - Doing siRNA knockdowns and endocytosis inhibition in the IPF fibroblasts to further strengthen the link between elevated expression of VPS33B/ ITGA11 and increased collagen uptake.

      We thank the reviewer for suggesting these experiments. Due to limitations of the patient-derived fibroblasts (cell numbers and passage numbers) we had to prioritise experiments, and thus have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).  

      Reviewer #3 (Recommendations For The Authors): 

      Major points 

      (1) Choice of cells: Please provide a rationale for why each cell line was used, and make sure that it is clear throughout the manuscript which cell line was used for each particular experiment. The HEK293T cell line is also missing from the reagent table.

      We thank the reviewer for pointing out this omission, and have clarified in our revised manuscript which cell lines were used in each experiment. We used HEK293T to generate lentiviruses as described in the methods section.  

      (2) Missing information from methods. Experimental details are missing from the methods in several places, making it difficult for someone to replicate an experiment. For example, no details are given in the methods describing the explant culture of murine tail tendons (described in results lines 78100), and there are no details on how the skin samples were obtained or stained. Further, no ethical approval details are provided for the use of human skin tissue.

      We apologise for leaving the ethical approval details and skin sample collection out, this was an oversight and will be included in the revised manuscript. We have also included the method to how murine tail tendons were cultured ex vivo (revised lines 527-531, 546-553).  

      (3) Functional studies in patient-derived cells. To fully establish the role of VPS33b and ITGA11 in fibrotic diseases, functional studies including the knockdown/overexpression of these genes could be performed to establish if the same response is seen as in non-diseased cells.

      We agree that this will add much to the paper, and have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).

      Minor Points

      We thank the reviewer for pointing out these mistakes, and have corrected and included additional details in the revised manuscript.  

      (1) Lines 51-52. Wording of this sentence is unclear, please rephrase. 

      (2) Line 182. Should this be Fig 4G rather than J? 

      (3) Line 209. Correct spelling of glycosylated. 

      (4) Line 463. Incomplete brackets and details missing? 

      (5) Line 590. Correct tense - was rather than are. 

      (6) Line 593. Specify centrifugation speed. 

      (7) Line 619. Nuclei rather than nucleus. 

      (8) Ln 650. Statistical analysis - was normality tested? 

      (9) Figure 1e - Difficult to read labels for coll/DAPI.

    1. eLife Assessment

      This valuable study discusses a hot topic in post-endoscopic retrograde cholangiopancreatography pancreatitis. The new score for predicting post-ERCP pancreatitis offers an idea about the risk of pancreatitis before the procedure. Although most scores depend on intraprocedural manoeuvres, such as the number of attempts to cannulate the papilla, this is a solid retrospective single-center study in one country. To be validated in the future, this score will need to be done in many countries and on large numbers of patients.

    2. Joint Public Review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Strengths:

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      Comments on revised version:

      Depending on old references cannot help us know the current situation. What if there are better more recent predictive tools? It would be better to test the validity of that score against, if present, a proven score to check its validity.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Thank you for reviewing our manuscript. We hope that this score will be validated in other countries from now on.

      Strengths

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      Thank you for this comment. It’s exactly as you said. This is a limitation (Lines 326-327).

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      Thank you for this valuable comment. The predictive PEP score is not necessary for the excluded patients. The reasons were as follows. Biliary duct cannulation was not attempted in patients for whom it was difficult to identify the Vater papilla. The biliary tract was separated from the pancreas in patients with a past history of choledochojejunostomy, pancreatojejunostomy, or pancreatogastrostomy. PEP risk was thought to be low in these patients and patients who underwent bile duct cannulation via the choledochoduodenal fistula. PEP diagnosis is difficult in patients with acute pancreatitis, whose diagnosis is currently in progress. We added these explanations (Lines 98-106).

      (3) Many other studies, e.g., https://link.springer.com/article/10.1007/s00464-021-08491-1, https://pubmed.ncbi.nlm.nih.gov/36344369/, that have been published before discussing the same issue, so what is the new with this score?

      Thank you for raising the new reference written by Archibugi et al. in 2023. The novelty of our score is that it is calculated using the factors that are investigated before ERCP procedures. The study written by Archibugi et al. involved procedure time and cannulation attempts for PEP prediction. These two factors are unknown before ERCP procedures. Therefore, a preprocedural predictive risk model for PEP was not created before our study was performed. We added the content of the past study written by Archibugi and included the report as a reference (Lines 65-67, 73-74).

      (4) The discussion section needs reformulation to express the study's aim and results.

      Thank you for this valuable comment. I have rewritten the first paragraph of the discussion. In the paragraph, we showed that the study achieved the aim on the basis of the results (Lines 245-255).

      (5) Why did the authors select these items in their scoring system and did not add more variables?

      Thank you for this valuable comment. We selected the items listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. We added this description (Lines 123-126). The original references of the guidelines were cited in the first draft version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment1. Please revise these documents: copyright, disclaimer, ethics approval, consent to participate, consent for publication, data and material availability, competing interests, funding, authors' contributions, and acknowledgments.

      First, thank you for reviewing our manuscript. We have already described the required information in the “author information” section. The sentences containing this information were proofread in English.

      Reviewer #2 (Recommendations for the authors):

      Comment 1. It would be best if you did this study in a Prospective way for more validation.

      First, thank you for reviewing our manuscript. We have revised our manuscript according to your comments. It’s exactly as you said. These points are limitations (Lines 312-318, lines 326-327). We hope that future validation studies over wider geographic regions will prove our opinions.

      Comment 2. The model name should be Acronyum (the first letter of the five items in the risk model).

      Thank you for this valuable comment. Sorry, we could not create a memorable model name using the first letter of the five items.

      Comment 3. You say that you include the pre-procedure criteria that predict PEP. You state one of the items, pancreatic duct procedure. Do you mean it is a history?

      Thank you for this valuable comment. This means that the main purpose is the pancreatic duct. Therefore, the pancreatic duct procedure is listed as “planned pancreatic duct procedures” in Figure 2 (Lines 40-41, 231-234). When an unintended pancreatic duct procedure is performed, we can calculate the risk score by adding two points for “planned pancreatic duct procedures” (Lines 48-49, 247-250).

      Comment 4. Regarding calcification, do you mean chronic pancreatitis? It needs more clarification regarding its degree.

      Thank you for this valuable comment. We regard pancreatic calcification as a finding of chronic pancreatitis. Pancreatic calcification was defined as the degree that was confirmed by imaging, such as CT, MRI, and EUS. These definitions have been written in the first draft version (Lines 134-137).

      Comment 5. Why don't you include young age in the model? Your result found that age less than 50 is significantly associated with PEP.

      Thank you for this valuable comment. We selected the PEP risk factors listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. Age less than 50 years was listed as a PEP risk factor in the Japanese guidelines for acute pancreatitis. We added this description (Lines 123-126).

      Comment 6. There is an ancient reference, some of them in 1994,1996.

      Sorry for the old references. These references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 123-126).

      Comment 7. In the introduction, you say that the first score includes one of the items for PEP pain during the procedure. It is a little bit strange.

      Thank you for this comment. The first PEP risk score did not involve PEP pain but involved pain during the procedure (Line 68).

      Comment 8. We know that once ERCP is indicated, you justify the importance of the risk model, stating that if one or more risks are found, we can do EUS or PTD. It is not reasonable to abort the procedure in case of frequent pancreatic duct cannulation or cancel ERCP if pt has one or more risk factors.

      Thank you for this valuable comment. If ERCP is performed for high-risk patients, prophylaxes for PEP, such as procedures by experts, pancreatic stent placement, and NSAID suppository insertion, should be performed as much as possible (Lines 281-287, 308-311).

      Comment 9. Regarding ERCP pancreatitis criteria, does it include amylase 3t or lipase?

      Thank you for this comment. We used Cotton’s criteria for diagnosing PEP. Cotton’s criteria include hyperamylasemia (more than three times the normal upper limit) at least 24 hours after ERCP (114-116).

      Comment 10. It is well known that pr with functional biliary disorder has a high incidence of PEP; it doesn't need a manometer for diagnosis. It needs to be included.

      Thank you for this comment. Moreover, functional biliary disorders are difficult to diagnose before ERCP procedures (Lines 259-262). The factor that is not apparent before ERCP could not be included in the predictive PEP scoring system.

      Comment 11: What is gabexare and nafamost.

      Thank you for this comment, and sorry for our insufficient explanation. These compounds include gabexate masilate and nafamostat masilate, which are protease inhibitors. In some institutions, protease inhibitors are used as prophylaxis for PEP. We added “protease inhibitors” (Lines 138-139, Tables 1 and 2).

      Reviewer #3 (Recommendations for the authors):

      Comment 1. The sample size needs clarification.

      First, thank you for reviewing our manuscript. The sample size has been included in the “Methods” section (Lines 157-165).

      Comment 2. They need to be mentioned cause they depend on old references in discussion and background.

      Thank you for this comment. The previous references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 122-126). In the background and discussion, we added new recent references and information related to the references (Lines 65-67, 285-287, 291-295, 308-311).

      Comment 3. Case definition should be added to the methodology.

      Thank you for this comment. We added patient information. Please refer to the response against the eLife assessment, weakness, (2).

      Comment 4. Do you include all who met the inclusion criteria, or was there any random sampling technique?

      No, we did not use random sampling techniques.

      Comment 5. What is the value of comparing the development and validation groups? I do not think it adds anything new as if you want to exclude confounders. Has the comparison revealed that a confounder does exist? What was your point of view concerning that?

      Thank you for this valuable comment, and sorry for the insufficient explanation. The differences between the development cohort and the validation cohort are important because the goodness of fit for the score could be confirmed in significantly different groups. We added this explanation (Lines 197-199, 251-253).

    1. eLife Assessment

      This important study provides one mechanism that can explain the rapid diversification of poison-antidote pairs in fission yeast: recombination between existing pairs. The evidence is largely solid, but the study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver. The work is of interest to colleagues studying genetic incompatibilities.

    2. Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

    3. Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.